Replica mean field theory for the generalisation gap of deep neural networks

S. Ariosto, 2 R. Pacelli, F. Ginelli, 2 M. Gherardi, 2 and P. Rotondo 2 Dipartimento di Scienza e Alta Tecnologia and Center for Nonlinear and Complex Systems, Università degli Studi dell’Insubria, Via Valleggio 11, 22100 Como, Italy I.N.F.N. Sezione di Milano, Via Celoria 16, 20133 Milano, Italy Dipartimento di Scienza Applicata e Tecnologia, Politecnico di Torino, 10129 Torino, Italy Università degli Studi di Milano, Via Celoria 16, 20133 Milano, Italy

[1]  M. Mézard,et al.  Spin Glass Theory And Beyond: An Introduction To The Replica Method And Its Applications , 1986 .

[2]  E. Gardner The space of interactions in neural network models , 1988 .

[3]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[4]  M. Opper,et al.  Statistical mechanics of Support Vector networks. , 1998, cond-mat/9811421.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[7]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Léon Bottou,et al.  Making Vapnik–Chervonenkis Bounds Accurate , 2015 .

[10]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[14]  F. Santambrogio {Euclidean, metric, and Wasserstein} gradient flows: an overview , 2016, 1609.03890.

[15]  Andrea Montanari,et al.  A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.

[16]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[17]  Peter L. Bartlett,et al.  Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..

[18]  Jaehoon Lee,et al.  Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.

[19]  Andrea Montanari,et al.  Linearized two-layers neural networks in high dimension , 2019, The Annals of Statistics.

[20]  G. Biroli,et al.  Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime , 2020, ICML.

[21]  Marco Gherardi,et al.  Beyond the storage capacity: data driven satisfiability transition , 2020, Physical review letters.

[22]  Vittorio Erba,et al.  Statistical learning theory of structured data. , 2020, Physical review. E.

[23]  Blake Bordelon,et al.  Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks , 2020, ICML.

[24]  M. Lagomarsino,et al.  Counting the learnable functions of geometrically structured data , 2020 .

[25]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[26]  Lenka Zdeborová,et al.  Understanding deep learning is also a job for physicists , 2020, Nature Physics.

[27]  C. Pehlevan,et al.  Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks , 2020, Nature Communications.

[28]  Samy Bengio,et al.  Understanding deep learning (still) requires rethinking generalization , 2021, Commun. ACM.

[29]  Mikhail Belkin,et al.  Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation , 2021, Acta Numerica.

[30]  Andrea Montanari,et al.  The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.

[31]  Carlo Baldassi,et al.  Learning through atypical "phase transitions" in overparameterized neural networks , 2021, ArXiv.

[32]  Critical properties of the SAT/UNSAT transitions in the classification problem of structured data , 2021, Journal of Statistical Mechanics: Theory and Experiment.

[33]  Marco Gherardi,et al.  Solvable Model for the Linear Separability of Structured Data , 2021, Entropy.

[34]  R. Zecchina,et al.  Unveiling the structure of wide flat minima in neural networks , 2021, Physical review letters.

[35]  Surya Ganguli,et al.  A theory of high dimensional regression with arbitrary correlations between input features and target functions: sample complexity, multiple descent curves and a hierarchy of phase transitions , 2021, ICML.