论文信息 - Analysis of One-Hidden-Layer Neural Networks via the Resolvent Method - 字舞流文

Analysis of One-Hidden-Layer Neural Networks via the Resolvent Method

We compute the asymptotic empirical spectral distribution of a non-linear random matrix model by using the resolvent method. Motivated by random neural networks, we consider the random matrix $M = Y Y^\ast$ with $Y = f(WX)$, where $W$ and $X$ are random rectangular matrices with i.i.d. centred entries and $f$ is a non-linear smooth function which is applied entry-wise. We prove that the Stieltjes transform of the limiting spectral distribution satisfies a quartic self-consistent equation up to some error terms, which is exactly the equation obtained by [Pennington, Worah] and [Benigni, Peche] with the moment method approach. In addition, we extend the previous results to the case of additive bias $Y=f(WX+B)$ with $B$ being an independent rank-one Gaussian random matrix, closer modelling the neural network infrastructures encountering in practice. Our approach following the \emph{resolvent method} is more robust than the moment method and is expected to provide insights also for models where the combinatorics of the latter become intractable.

Dominik Schröder | Vanessa Piccolo | Dominik Schröder | Vanessa Piccolo

[1] 俊一甘利. 5分で分かる!? 有名論文ナナメ読み：Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[2] L'aszl'o ErdHos,et al. RANDOM MATRICES WITH SLOW CORRELATION DECAY , 2017, Forum of Mathematics, Sigma.

[3] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[4] Yukun He,et al. Mesoscopic eigenvalue statistics of Wigner matrices , 2016, 1603.01499.

[5] Zhenyu Liao,et al. A Random Matrix Approach to Neural Networks , 2017, ArXiv.

[6] Boris A. Khoruzhenko,et al. Asymptotic properties of large random matrices with independent entries , 1996 .

[7] Jeffrey Pennington,et al. Nonlinear random matrix theory for deep learning , 2019, NIPS.

[8] Zhenyu Liao,et al. On the Spectrum of Random Features Maps of High Dimensional Data , 2018, ICML.

[9] L. Erdős,et al. Stability of the matrix Dyson equation and random matrices with correlations , 2016, 1604.08188.

[10] Lucas Benigni,et al. Eigenvalue distribution of nonlinear models of random matrices , 2019, ArXiv.

[11] Thomas Dupic,et al. Spectral density of products of Wishart dilute random matrices. Part I: the dense case , 2014, 1401.7802.

[12] Ron Rosenthal,et al. Isotropic self-consistent equations for mean-field random matrices , 2016, 1611.05364.

[13] Z. Fan,et al. Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks , 2020, NeurIPS.

[14] Peter McCullagh,et al. Cumulants and Partition Lattices , 2012 .

[15] Ziliang Che,et al. Edge universality of correlated Gaussians , 2019, Electronic Journal of Probability.

[16] Jeffrey Pennington,et al. The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network , 2018, NeurIPS.

[17] Jeffrey Pennington,et al. A Random Matrix Perspective on Mixtures of Nonlinearities in High Dimensions , 2019, AISTATS.

[18] Peter J. Forrester,et al. Eigenvalue statistics for product complex Wishart matrices , 2014, 1401.2572.

[19] Romain Couillet,et al. Concentration of Measure and Large Random Matrices with an application to Sample Covariance Matrices , 2018, 1805.08295.