Convex Programs for Global Optimization of Convolutional Neural Networks in Polynomial-Time

We study training of Convolutional Neural Networks (CNNs) with ReLU activations and introduce exact convex optimization formulations with a polynomial complexity with respect to the number of data samples, the number of neurons and data dimension. Particularly, we develop a convex analytic framework utilizing semi-infinite duality to obtain equivalent convex optimization problems for two-layer CNNs, where convex problems are regularized by the sum of (cid:96) 2 norms of variables.

[1]  Mert Pilanci,et al.  Convex Geometry and Duality of Over-parameterized Neural Networks , 2020, J. Mach. Learn. Res..

[2]  Mert Pilanci,et al.  Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks , 2020, ICML.

[3]  Robert D. Nowak,et al.  Minimum "Norm" Neural Networks are Splines , 2019, ArXiv.

[4]  Guy Blanc,et al.  Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process , 2019, COLT.

[5]  Nathan Srebro,et al.  How do infinite width bounded norm networks look in function space? , 2019, COLT.

[6]  Edouard Oyallon,et al.  Greedy Layerwise Learning Can Scale to ImageNet , 2018, ICML.

[7]  Guanghui Lan,et al.  Complexity of Training ReLU Neural Network , 2018, Discret. Optim..

[8]  Nathan Srebro,et al.  Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.

[9]  Sylvain Gelly,et al.  Gradient Descent Quantizes ReLU Network Features , 2018, ArXiv.

[10]  Stephen P. Boyd,et al.  A Rewriting System for Convex Optimization Problems , 2017, J. Control. Decis..

[11]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[12]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[13]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[14]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  A. Shapiro Semi-infinite programming, duality, discretization and optimality conditions , 2009 .

[17]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[18]  Ji Zhu,et al.  l1 Regularization in Infinite Dimensional Feature Spaces , 2007, COLT.

[19]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[20]  Nicolas Le Roux,et al.  Convex Neural Networks , 2005, NIPS.

[21]  Stephen P. Boyd,et al.  Convex Optimization , 2004, IEEE Transactions on Automatic Control.

[22]  Piyush C. Ojha,et al.  Enumeration of linear threshold functions from the lattice of hyperplane intersections , 2000, IEEE Trans. Neural Networks Learn. Syst..

[23]  M. A. López-Cerdá,et al.  Linear Semi-Infinite Optimization , 1998 .

[24]  R. Winder Partitions of N-Space by Hyperplanes , 1966 .

[25]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[26]  M. Sion On general minimax theorems , 1958 .

[27]  Mert Pilanci,et al.  Convex Geometry of Two-Layer ReLU Networks: Implicit Autoencoding and Interpretable Models , 2020, AISTATS.

[28]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[29]  R. Stanley An Introduction to Hyperplane Arrangements , 2007 .

[30]  Kim-Chuan Toh,et al.  SDPT3 — a Matlab software package for semidefinite-quadratic-linear programming, version 3.0 , 2001 .

[31]  W. Rudin Principles of mathematical analysis , 1964 .