论文信息 - Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time

Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time

We study training of Convolutional Neural Networks (CNNs) with ReLU activations and introduce exact convex optimization formulations with a polynomial complexity with respect to the number of data samples, the number of neurons, and data dimension. More specifically, we develop a convex analytic framework utilizing semi-infinite duality to obtain equivalent convex optimization problems for several two- and three-layer CNN architectures. We first prove that two-layer CNNs can be globally optimized via an $\ell_2$ norm regularized convex program. We then show that three-layer CNN training problems are equivalent to an $\ell_1$ regularized convex program that encourages sparsity in the spectral domain. We also extend these results to multi-layer CNN architectures including three-layer networks with two ReLU layers and deeper circular convolutions with a single ReLU layer. Furthermore, we present extensions of our approach to different pooling methods, which elucidates the implicit architectural bias as convex regularizers.

Mert Pilanci | Tolga Ergen

[1] Guanghui Lan,et al. Complexity of Training ReLU Neural Networks , 2018 .

[2] Ji Zhu,et al. l1 Regularization in Infinite Dimensional Feature Spaces , 2007, COLT.

[3] Sylvain Gelly,et al. Gradient Descent Quantizes ReLU Network Features , 2018, ArXiv.

[4] Yoshua Bengio,et al. On the Spectral Bias of Neural Networks , 2018, ICML.

[5] Michael Eickenberg,et al. Greedy Layerwise Learning Can Scale to ImageNet , 2018, ICML.

[6] Guy Blanc,et al. Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process , 2019, COLT.

[7] Nathan Srebro,et al. How do infinite width bounded norm networks look in function space? , 2019, COLT.

[8] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[9] Mert Pilanci,et al. Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks , 2020, ICML.

[10] M. Sion. On general minimax theorems , 1958 .

[11] P. Bühlmann,et al. The group lasso for logistic regression , 2008 .