论文信息 - Revealing the Structure of Deep Neural Networks via Convex Duality

Revealing the Structure of Deep Neural Networks via Convex Duality

We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set. For the special case of deep linear networks with $K$ outputs, we prove that each optimal weight matrix is rank-$K$ and aligns with the previous layers via duality. More importantly, we apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds. As a corollary, we prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. Furthermore, we provide closed-form solutions for the optimal layer weights when data is rank-one or whitened. We then verify our theory via numerical experiments.

Mert Pilanci | Tolga Ergen

[1] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[2] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.

[3] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).

[4] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[5] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6] Makoto Yamada,et al. FsNet: Feature Selection Network on High-dimensional Biological Data , 2020, ArXiv.

[7] Wei Hu,et al. Width Provably Matters in Optimization for Deep Linear Neural Networks , 2019, ICML.

[8] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[9] Wei Hu,et al. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.

[10] Nathan Srebro,et al. How do infinite width bounded norm networks look in function space? , 2019, COLT.

[11] Jin Keun Seo,et al. Framelet pooling aided deep learning network: the method to process high dimensional medical data , 2019, Mach. Learn. Sci. Technol..

[12] Mert Pilanci,et al. Convex Neural Autoregressive Models: Towards Tractable, Expressive, and Theoretically-Backed Models for Sequential Forecasting and Generation , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Mert Pilanci,et al. Convex Optimization for Shallow Neural Networks , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[14] Mert Pilanci,et al. Convex Duality and Cutting Plane Methods for Over-parameterized Neural Networks , 2019 .

[15] Wei Hu,et al. A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks , 2018, ICLR.

[16] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.

[17] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[18] Ohad Shamir,et al. Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks , 2018, COLT.

[19] Mert Pilanci,et al. Convex Geometry of Two-Layer ReLU Networks: Implicit Autoencoding and Interpretable Models , 2020, AISTATS.

[20] Qiang Liu,et al. On the Margin Theory of Feedforward Neural Networks , 2018, ArXiv.

[21] Robert D. Nowak,et al. Minimum "Norm" Neural Networks are Splines , 2019, ArXiv.

[22] Ji Zhu,et al. l1 Regularization in Infinite Dimensional Feature Spaces , 2007, COLT.

[23] Ruslan Salakhutdinov,et al. Deep Neural Networks with Multi-Branch Architectures Are Intrinsically Less Non-Convex , 2019, AISTATS.

[24] Lei Huang,et al. Decorrelated Batch Normalization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25] Thomas Laurent,et al. Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global , 2017, ICML.