Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time

We study training of Convolutional Neural Networks (CNNs) with ReLU activations and introduce exact convex optimization formulations with a polynomial complexity with respect to the number of data samples, the number of neurons, and data dimension. More specifically, we develop a convex analytic framework utilizing semi-infinite duality to obtain equivalent convex optimization problems for several two- and three-layer CNN architectures. We first prove that two-layer CNNs can be globally optimized via an $\ell_2$ norm regularized convex program. We then show that three-layer CNN training problems are equivalent to an $\ell_1$ regularized convex program that encourages sparsity in the spectral domain. We also extend these results to multi-layer CNN architectures including three-layer networks with two ReLU layers and deeper circular convolutions with a single ReLU layer. Furthermore, we present extensions of our approach to different pooling methods, which elucidates the implicit architectural bias as convex regularizers.

[1]  Guanghui Lan,et al.  Complexity of Training ReLU Neural Networks , 2018 .

[2]  Ji Zhu,et al.  l1 Regularization in Infinite Dimensional Feature Spaces , 2007, COLT.

[3]  Sylvain Gelly,et al.  Gradient Descent Quantizes ReLU Network Features , 2018, ArXiv.

[4]  Yoshua Bengio,et al.  On the Spectral Bias of Neural Networks , 2018, ICML.

[5]  Michael Eickenberg,et al.  Greedy Layerwise Learning Can Scale to ImageNet , 2018, ICML.

[6]  Guy Blanc,et al.  Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process , 2019, COLT.

[7]  Nathan Srebro,et al.  How do infinite width bounded norm networks look in function space? , 2019, COLT.

[8]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[9]  Mert Pilanci,et al.  Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks , 2020, ICML.

[10]  M. Sion On general minimax theorems , 1958 .

[11]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[12]  Mert Pilanci,et al.  Convex Geometry and Duality of Over-parameterized Neural Networks , 2020, J. Mach. Learn. Res..

[13]  R. Winder Partitions of N-Space by Hyperplanes , 1966 .

[14]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[15]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[16]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[17]  Raimund Seidel,et al.  Constructing Arrangements of Lines and Hyperplanes with Applications , 1986, SIAM J. Comput..

[18]  Kim-Chuan Toh,et al.  SDPT3 — a Matlab software package for semidefinite-quadratic-linear programming, version 3.0 , 2001 .

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[21]  A. Shapiro Semi-infinite programming, duality, discretization and optimality conditions , 2009 .

[22]  Robert D. Nowak,et al.  Minimum "Norm" Neural Networks are Splines , 2019, ArXiv.

[23]  W. Rudin Principles of mathematical analysis , 1964 .

[24]  Stephen Boyd,et al.  A Rewriting System for Convex Optimization Problems , 2017, ArXiv.

[25]  Mert Pilanci,et al.  Convex Geometry of Two-Layer ReLU Networks: Implicit Autoencoding and Interpretable Models , 2020, AISTATS.

[26]  Piyush C. Ojha,et al.  Enumeration of linear threshold functions from the lattice of hyperplane intersections , 2000, IEEE Trans. Neural Networks Learn. Syst..

[27]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[28]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[29]  Nicolas Le Roux,et al.  Convex Neural Networks , 2005, NIPS.

[30]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[31]  R. Stanley An Introduction to Hyperplane Arrangements , 2007 .