论文信息 - Global Optimality Beyond Two Layers: Training Deep ReLU Networks via Convex Programs

Global Optimality Beyond Two Layers: Training Deep ReLU Networks via Convex Programs

Understanding the fundamental mechanism behind the success of deep neural networks is one of the key challenges in the modern machine learning literature. Despite numerous attempts, a solid theoretical analysis is yet to be developed. In this paper, we develop a novel unified framework to reveal a hidden regularization mechanism through the lens of convex optimization. We first show that the training of multiple threelayer ReLU sub-networks with weight decay regularization can be equivalently cast as a convex optimization problem in a higher dimensional space, where sparsity is enforced via a group `1norm regularization. Consequently, ReLU networks can be interpreted as high dimensional feature selection methods. More importantly, we then prove that the equivalent convex problem can be globally optimized by a standard convex optimization solver with a polynomial-time complexity with respect to the number of samples and data dimension when the width of the network is fixed. Finally, we numerically validate our theoretical results via experiments involving both synthetic and real datasets.

Mert Pilanci | Tolga Ergen | Mert Pilanci | Tolga Ergen

[1] Morteza Mardani,et al. Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization , 2021, ICLR.

[2] Ji Zhu,et al. l1 Regularization in Infinite Dimensional Feature Spaces , 2007, COLT.

[3] René Vidal,et al. Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[5] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.

[6] P. Bartlett,et al. Hardness results for neural network approximation problems , 1999, Theor. Comput. Sci..

[7] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[8] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[9] Mert Pilanci,et al. Convex Optimization for Shallow Neural Networks , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[10] Stephen Boyd,et al. A Rewriting System for Convex Optimization Problems , 2017, ArXiv.

[11] Mert Pilanci,et al. Convex Neural Autoregressive Models: Towards Tractable, Expressive, and Theoretically-Backed Models for Sequential Forecasting and Generation , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.

[13] Ruslan Salakhutdinov,et al. Deep Neural Networks with Multi-Branch Architectures Are Intrinsically Less Non-Convex , 2019, AISTATS.

[14] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.

[15] M. Sion. On general minimax theorems , 1958 .

[16] Mert Pilanci,et al. Convex Geometry of Two-Layer ReLU Networks: Implicit Autoencoding and Interpretable Models , 2020, AISTATS.

[17] Mert Pilanci,et al. Convex Duality of Deep Neural Networks , 2020, ArXiv.

[18] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[19] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.

[20] Mert Pilanci,et al. Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time , 2021, ICLR.

[21] J. Czerniak,et al. Application of rough sets in the presumptive diagnosis of urinary system diseases , 2003 .