Analysis of One-Hidden-Layer Neural Networks via the Resolvent Method

We compute the asymptotic empirical spectral distribution of a non-linear random matrix model by using the resolvent method. Motivated by random neural networks, we consider the random matrix $M = Y Y^\ast$ with $Y = f(WX)$, where $W$ and $X$ are random rectangular matrices with i.i.d. centred entries and $f$ is a non-linear smooth function which is applied entry-wise. We prove that the Stieltjes transform of the limiting spectral distribution satisfies a quartic self-consistent equation up to some error terms, which is exactly the equation obtained by [Pennington, Worah] and [Benigni, Peche] with the moment method approach. In addition, we extend the previous results to the case of additive bias $Y=f(WX+B)$ with $B$ being an independent rank-one Gaussian random matrix, closer modelling the neural network infrastructures encountering in practice. Our approach following the \emph{resolvent method} is more robust than the moment method and is expected to provide insights also for models where the combinatorics of the latter become intractable.

[1]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[2]  L'aszl'o ErdHos,et al.  RANDOM MATRICES WITH SLOW CORRELATION DECAY , 2017, Forum of Mathematics, Sigma.

[3]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[4]  Yukun He,et al.  Mesoscopic eigenvalue statistics of Wigner matrices , 2016, 1603.01499.

[5]  Zhenyu Liao,et al.  A Random Matrix Approach to Neural Networks , 2017, ArXiv.

[6]  Boris A. Khoruzhenko,et al.  Asymptotic properties of large random matrices with independent entries , 1996 .

[7]  Jeffrey Pennington,et al.  Nonlinear random matrix theory for deep learning , 2019, NIPS.

[8]  Zhenyu Liao,et al.  On the Spectrum of Random Features Maps of High Dimensional Data , 2018, ICML.

[9]  L. Erdős,et al.  Stability of the matrix Dyson equation and random matrices with correlations , 2016, 1604.08188.

[10]  Lucas Benigni,et al.  Eigenvalue distribution of nonlinear models of random matrices , 2019, ArXiv.

[11]  Thomas Dupic,et al.  Spectral density of products of Wishart dilute random matrices. Part I: the dense case , 2014, 1401.7802.

[12]  Ron Rosenthal,et al.  Isotropic self-consistent equations for mean-field random matrices , 2016, 1611.05364.

[13]  Z. Fan,et al.  Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks , 2020, NeurIPS.

[14]  Peter McCullagh,et al.  Cumulants and Partition Lattices , 2012 .

[15]  Ziliang Che,et al.  Edge universality of correlated Gaussians , 2019, Electronic Journal of Probability.

[16]  Jeffrey Pennington,et al.  The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network , 2018, NeurIPS.

[17]  Jeffrey Pennington,et al.  A Random Matrix Perspective on Mixtures of Nonlinearities in High Dimensions , 2019, AISTATS.

[18]  Peter J. Forrester,et al.  Eigenvalue statistics for product complex Wishart matrices , 2014, 1401.2572.

[19]  Romain Couillet,et al.  Concentration of Measure and Large Random Matrices with an application to Sample Covariance Matrices , 2018, 1805.08295.