On Random Matrices Arising in Deep Neural Networks. Gaussian Case

The paper deals with distribution of singular values of product of random matrices arising in the analysis of deep neural networks. The matrices resemble the product analogs of the sample covariance matrices, however, an important difference is that the population covariance matrices, which are assumed to be non-random in the standard setting of statistics and random matrix theory, are now random, moreover, are certain functions of random data matrices. The problem has been considered in recent work [21] by using the techniques of free probability theory. Since, however, free probability theory deals with population matrices which are independent of the data matrices, its applicability in this case requires an additional justification. We present this justification by using a version of the standard techniques of random matrix theory under the assumption that the entries of data matrices are independent Gaussian random variables. In the subsequent paper [18] we extend our results to the case where the entries of data matrices are just independent identically distributed random variables with several finite moments. This, in particular, extends the property of the so-called macroscopic universality on the considered random matrices.

[1]  Ausif Mahmood,et al.  Review of Deep Learning Algorithms and Architectures , 2019, IEEE Access.

[2]  Robert C. Qiu,et al.  Spectrum Concentration in Deep Residual Learning: A Free Probability Approach , 2018, IEEE Access.

[3]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[4]  Dong Eui Chang,et al.  Deep Neural Networks in a Mathematical Framework , 2018, SpringerBriefs in Computer Science.

[5]  Surya Ganguli,et al.  The Emergence of Spectral Universality in Deep Networks , 2018, AISTATS.

[6]  Richard E. Turner,et al.  Gaussian Process Behaviour in Wide Deep Neural Networks , 2018, ICLR.

[7]  A. Chakrabarty,et al.  A note on the folklore of free independence , 2018, 1802.00952.

[8]  Michael W. Mahoney,et al.  Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior , 2017, ArXiv.

[9]  Jeffrey Pennington,et al.  Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.

[10]  Dianhui Wang,et al.  Randomness in neural networks: an overview , 2017, WIREs Data Mining Knowl. Discov..

[11]  Surya Ganguli,et al.  Deep Information Propagation , 2016, ICLR.

[12]  Roland Speicher,et al.  Free Probability and Random Matrices , 2014, 1404.3393.

[13]  Nikhil Buduma,et al.  Fundamentals of deep learning , 2017 .

[14]  Surya Ganguli,et al.  Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.

[15]  Guillermo Sapiro,et al.  Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? , 2015, IEEE Transactions on Signal Processing.

[16]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[17]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[18]  F. Gotze,et al.  Asymptotic spectra of matrix-valued functions of independent random matrices and free probability , 2014, 1408.1732.

[19]  R. Couillet,et al.  Analysis of the limiting spectral measure of large random matrices of the separable covariance type , 2013, 1310.8094.

[20]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  L. Pastur,et al.  Eigenvalue Distribution of Large Random Matrices , 2011 .

[22]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[23]  J. W. Silverstein,et al.  Spectral Analysis of Large Dimensional Random Matrices , 2009 .

[24]  Ralf R. Müller,et al.  On the asymptotic eigenvalue distribution of concatenated vector-valued fading channels , 2002, IEEE Trans. Inf. Theory.

[25]  F. Hiai,et al.  The semicircle law, free random variables, and entropy , 2006 .

[26]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.