论文信息 - The Emergence of Spectral Universality in Deep Networks

The Emergence of Spectral Universality in Deep Networks

Recent work has shown that tight concentration of the entire spectrum of singular values of a deep network's input-output Jacobian around one at initialization can speed up learning by orders of magnitude. Therefore, to guide important design choices, it is important to build a full theoretical understanding of the spectra of Jacobians at initialization. To this end, we leverage powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network's Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth. For a variety of nonlinearities, our work reveals the emergence of new universal limiting spectral distributions that remain concentrated around one even as the depth goes to infinity.

[1] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[2] Jeffrey Pennington,et al. Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.

[3] Roland Speicher,et al. Free Probability and Random Matrices , 2014, 1404.3393.

[4] Kenji Doya,et al. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning , 2017, Neural Networks.

[5] Surya Ganguli,et al. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice , 2017, NIPS.

[6] Gaston H. Gonnet,et al. On the LambertW function , 1996, Adv. Comput. Math..

[7] Surya Ganguli,et al. Deep Information Propagation , 2016, ICLR.

[8] T. Tao. Topics in Random Matrix Theory , 2012 .

[9] Jiri Matas,et al. All you need is a good init , 2015, ICLR.

[10] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.

[11] Alexandru Nica,et al. Free random variables , 1992 .

[12] R. Speicher. Multiplicative functions on the lattice of non-crossing partitions and free convolution , 1994 .

[13] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.