Appearance of random matrix theory in deep learning
暂无分享,去创建一个
Diego Granziol | Nicholas P Baskerville | Nicholas P. Baskerville | Jonathan P Keating | J. Keating | Diego Granziol
[1] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[2] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[3] Yann LeCun,et al. Explorations on high dimensional landscapes , 2014, ICLR.
[4] E. Gardner,et al. Optimal storage properties of neural network models , 1988 .
[5] C. Beenakker. Random-matrix theory of quantum transport , 1996, cond-mat/9612179.
[6] Vardan Papyan,et al. The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size. , 2018 .
[7] Jeffrey Pennington,et al. Nonlinear random matrix theory for deep learning , 2019, NIPS.
[8] Daniel A. Roberts,et al. The Principles of Deep Learning Theory , 2021, ArXiv.
[9] Stefano Soatto,et al. On the energy landscape of deep networks , 2015, 1511.06485.
[10] Y. Fyodorov. Complexity of random energy landscapes, glass transition, and absolute value of the spectral determinant of random matrices. , 2004 .
[11] R. Adler,et al. Random Fields and Geometry , 2007 .
[12] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[13] Taiji Suzuki,et al. Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint , 2020, ICLR.
[14] M. Berry,et al. Level clustering in the regular spectrum , 1977, Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences.
[15] Stephen Tyree,et al. Exact Gaussian Processes on a Million Data Points , 2019, NeurIPS.
[16] Y. Fyodorov,et al. Hessian spectrum at the global minimum of high-dimensional random landscapes , 2018, Journal of Physics A: Mathematical and Theoretical.
[17] Vardan Papyan,et al. Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians , 2019, ICML.
[18] Francesco Mezzadri,et al. A Spin Glass Model for the Loss Surfaces of Generative Adversarial Networks , 2021, Journal of Statistical Physics.
[19] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[20] G. Meurant,et al. The Lanczos and conjugate gradient algorithms in finite precision arithmetic , 2006, Acta Numerica.
[21] A. Buchleitner,et al. Spectral backbone of excitation transport in ultracold Rydberg gases , 2014, 1409.5625.
[22] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[23] M. Berry,et al. Semiclassical level spacings when regular and chaotic orbits coexist , 1984 .
[24] Antonio Auffinger,et al. Random Matrices and Complexity of Spin Glasses , 2010, 1003.1129.
[25] Florent Krzakala,et al. Statistical physics of inference: thresholds and algorithms , 2015, ArXiv.
[26] S. Kak. Information, physics, and computation , 1996 .
[27] Stephen J. Roberts,et al. Towards understanding the true loss surface of deep neural networks using random matrix theory and iterative spectral methods , 2019 .
[28] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[29] Sompolinsky,et al. Spin-glass models of neural networks. , 1985, Physical review. A, General physics.
[30] Yann LeCun,et al. Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond , 2016, 1611.07476.
[31] Diego Granziol. Beyond Random Matrix Theory for Deep Networks , 2020, ArXiv.
[32] G. Biroli,et al. Complex Energy Landscapes in Spiked-Tensor and Simple Glassy Models: Ruggedness, Arrangements of Local Minima, and Phase Transitions , 2018, Physical Review X.
[33] E. Bogomolny,et al. Distribution of the ratio of consecutive level spacings in random matrix ensembles. , 2012, Physical review letters.
[34] Jean-Philippe Bouchaud,et al. Cleaning large correlation matrices: tools from random matrix theory , 2016, 1610.08104.
[35] Jeffrey Pennington,et al. The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network , 2018, NeurIPS.
[36] Yue M. Lu,et al. A Precise Performance Analysis of Learning with Random Features , 2020, ArXiv.
[37] Ilya Sutskever,et al. Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.
[38] L. Deng,et al. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.
[39] Marylou Gabri'e. Mean-field inference methods for neural networks , 2019, ArXiv.
[40] T. Dahlin,et al. A comparison of the Gauss-Newton and quasi-Newton methods in resistivity imaging inversion , 2002 .
[41] Florent Krzakala,et al. Capturing the learning curves of generic features maps for realistic data sets with a teacher-student model , 2021, ArXiv.
[42] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[43] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[44] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[45] Naftali Tishby,et al. Machine learning and the physical sciences , 2019, Reviews of Modern Physics.
[46] M. Stephanov,et al. Random Matrices , 2005, hep-ph/0509286.
[47] Jeffrey Pennington,et al. Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.
[48] Sherif M. Abuelenin,et al. Effect of Unfolding on the Spectral Statistics of Adjacency Matrices of Complex Networks , 2012, Complex Adaptive Systems.
[49] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Di He,et al. A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems , 2019, ArXiv.
[51] Jeffrey Pennington,et al. The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization , 2020, ICML.
[52] Florent Krzakala,et al. Generalisation error in learning with random features and the hidden manifold model , 2020, ICML.
[53] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[54] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[55] Florent Krzakala,et al. Who is Afraid of Big Bad Minima? Analysis of Gradient-Flow in a Spiked Matrix-Tensor Model , 2019, NeurIPS.
[56] Yan V Fyodorov,et al. Replica Symmetry Breaking Condition Exposed by Random Matrix Calculation of Landscape Complexity , 2007, cond-mat/0702601.
[57] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[58] T. Tao. Topics in Random Matrix Theory , 2012 .
[59] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[60] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[61] Yann LeCun,et al. Open Problem: The landscape of the loss surfaces of multilayer networks , 2015, COLT.
[62] Florent Krzakala,et al. The Gaussian equivalence of generative models for learning with shallow neural networks , 2020, MSML.
[63] Lucas Benigni,et al. Eigenvalue distribution of nonlinear models of random matrices , 2019, ArXiv.
[64] C. Beenakker. Book reviewSupersymmetry in disorder and chaos: by K. Efetov Cambridge University Press, 1997. £65.00 hbk (xiii + 441 pages) ISBN 0 521 47097 8 , 1997 .
[65] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[66] Surya Ganguli,et al. Statistical Mechanics of Deep Learning , 2020, Annual Review of Condensed Matter Physics.
[67] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[68] T. Guhr,et al. RANDOM-MATRIX THEORIES IN QUANTUM PHYSICS : COMMON CONCEPTS , 1997, cond-mat/9707301.
[69] P. L. Doussal,et al. Topology Trivialization and Large Deviations for the Minimum in the Simplest Random Optimization , 2013, 1304.0024.
[70] Shankar Krishnan,et al. An Investigation into Neural Net Optimization via Hessian Eigenvalue Density , 2019, ICML.
[71] H. Weidenmuller,et al. Random Matrices and Chaos in Nuclear Physics , 2008, 0807.1070.