Revealing the Structure of Deep Neural Networks via Convex Duality

We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set. For the special case of deep linear networks with $K$ outputs, we prove that each optimal weight matrix is rank-$K$ and aligns with the previous layers via duality. More importantly, we apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds. As a corollary, we prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. Furthermore, we provide closed-form solutions for the optimal layer weights when data is rank-one or whitened. We then verify our theory via numerical experiments.

[1]  Barnabás Póczos,et al.  Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[2]  Francis Bach,et al.  On Lazy Training in Differentiable Programming , 2018, NeurIPS.

[3]  Nathan Srebro,et al.  Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).

[4]  Sanjeev Arora,et al.  On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Makoto Yamada,et al.  FsNet: Feature Selection Network on High-dimensional Biological Data , 2020, ArXiv.

[7]  Wei Hu,et al.  Width Provably Matters in Optimization for Deep Linear Neural Networks , 2019, ICML.

[8]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[9]  Wei Hu,et al.  Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.

[10]  Nathan Srebro,et al.  How do infinite width bounded norm networks look in function space? , 2019, COLT.

[11]  Jin Keun Seo,et al.  Framelet pooling aided deep learning network: the method to process high dimensional medical data , 2019, Mach. Learn. Sci. Technol..

[12]  Mert Pilanci,et al.  Convex Neural Autoregressive Models: Towards Tractable, Expressive, and Theoretically-Backed Models for Sequential Forecasting and Generation , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Mert Pilanci,et al.  Convex Optimization for Shallow Neural Networks , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[14]  Mert Pilanci,et al.  Convex Duality and Cutting Plane Methods for Over-parameterized Neural Networks , 2019 .

[15]  Wei Hu,et al.  A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks , 2018, ICLR.

[16]  Yuanzhi Li,et al.  Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.

[17]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[18]  Ohad Shamir,et al.  Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks , 2018, COLT.

[19]  Mert Pilanci,et al.  Convex Geometry of Two-Layer ReLU Networks: Implicit Autoencoding and Interpretable Models , 2020, AISTATS.

[20]  Qiang Liu,et al.  On the Margin Theory of Feedforward Neural Networks , 2018, ArXiv.

[21]  Robert D. Nowak,et al.  Minimum "Norm" Neural Networks are Splines , 2019, ArXiv.

[22]  Ji Zhu,et al.  l1 Regularization in Infinite Dimensional Feature Spaces , 2007, COLT.

[23]  Ruslan Salakhutdinov,et al.  Deep Neural Networks with Multi-Branch Architectures Are Intrinsically Less Non-Convex , 2019, AISTATS.

[24]  Lei Huang,et al.  Decorrelated Batch Normalization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Thomas Laurent,et al.  Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global , 2017, ICML.

[26]  Colin Wei,et al.  Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.

[27]  Mert Pilanci,et al.  Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms , 2020, ICLR.

[28]  Francis Bach,et al.  A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.

[29]  Morteza Mardani,et al.  Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization , 2021, ICLR.

[30]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[31]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[32]  Sanjeev Arora,et al.  Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.

[33]  Nathan Srebro,et al.  Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.

[34]  Sylvain Gelly,et al.  Gradient Descent Quantizes ReLU Network Features , 2018, ArXiv.

[35]  Mert Pilanci,et al.  Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time , 2021, ICLR.

[36]  Mert Pilanci,et al.  Convex Geometry and Duality of Over-parameterized Neural Networks , 2020, J. Mach. Learn. Res..

[37]  David L. Donoho,et al.  Prevalence of neural collapse during the terminal phase of deep learning training , 2020, Proceedings of the National Academy of Sciences.

[38]  Matus Telgarsky,et al.  Gradient descent aligns the layers of deep linear networks , 2018, ICLR.

[39]  Shai Shalev-Shwartz,et al.  SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.

[40]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[41]  Mert Pilanci,et al.  Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks , 2020, ICML.

[42]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..