Statistical Mechanics Methods for Discovering Knowledge from Modern Production Quality Neural Networks

There have long been connections between statistical mechanics and neural networks, but in recent decades these connections have withered. However, in light of recent failings of statistical learning theory and stochastic optimization theory to describe, even qualitatively, many properties of production-quality neural network models, researchers have revisited ideas from the statistical mechanics of neural networks. This tutorial will provide an overview of the area; it will go into detail on how connections with random matrix theory and heavy-tailed random matrix theory can lead to a practical phenomenological theory for large-scale deep neural networks; and it will describe future directions.

[1]  W. Little The existence of persistent states in the brain , 1974 .

[2]  Haim Sompolinsky,et al.  STATISTICAL MECHANICS OF NEURAL NETWORKS , 1988 .

[3]  Albrecht Rau,et al.  Statistical mechanics of neural networks , 1992 .

[4]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[5]  T. Watkin,et al.  THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .

[6]  D. Haussler,et al.  Rigorous learning curve bounds from statistical mechanics , 1994, COLT '94.

[7]  D. Sornette Critical Phenomena in Natural Sciences: Chaos, Fractals, Selforganization and Disorder: Concepts and Tools , 2000 .

[8]  J. Bouchaud,et al.  Theory Of Financial Risk And Derivative Pricing , 2000 .

[9]  Christian Van den Broeck,et al.  Statistical Mechanics of Learning , 2001 .

[10]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[11]  G. B. Arous,et al.  The Spectrum of Heavy Tailed Random Matrices , 2007, 0707.2159.

[12]  J. Bouchaud,et al.  Theory of Financial Risk and Derivative Pricing: From Statistical Physics to Risk Management , 2011 .

[13]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[14]  Antonio Auffinger,et al.  Extreme eigenvalues of sparse, heavy tailed random matrices , 2015, 1506.06175.

[15]  Christian Borgs,et al.  Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes , 2016, Proceedings of the National Academy of Sciences.

[16]  Surya Ganguli,et al.  Statistical Mechanics of Optimal Convex Inference in High Dimensions , 2016 .

[17]  Michael W. Mahoney,et al.  Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior , 2017, ArXiv.

[18]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[19]  Levent Sagun,et al.  Energy landscapes for machine learning. , 2017, Physical chemistry chemical physics : PCCP.

[20]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[21]  Jeffrey Pennington,et al.  Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.

[22]  Joseph Gonzalez,et al.  On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent , 2018, ArXiv.

[23]  Surya Ganguli,et al.  The Emergence of Spectral Universality in Deep Networks , 2018, AISTATS.

[24]  Tomaso A. Poggio,et al.  Theory IIIb: Generalization in Deep Networks , 2018, ArXiv.

[25]  Tomaso A. Poggio,et al.  A Surprising Linear Relationship Predicts Test Performance in Deep Networks , 2018, ArXiv.

[26]  Kurt Keutzer,et al.  Hessian-based Analysis of Large Batch Training and Robustness to Adversaries , 2018, NeurIPS.

[27]  Michael W. Mahoney,et al.  Traditional and Heavy-Tailed Self Regularization in Neural Network Models , 2019, ICML.

[28]  Jascha Sohl-Dickstein,et al.  Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..

[29]  Michael W. Mahoney,et al.  Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks , 2019, SDM.

[30]  Michael W. Mahoney,et al.  Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning , 2018, J. Mach. Learn. Res..