论文信息 - A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks

A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks

We develop a mathematically rigorous framework for multilayer neural networks in the mean field regime. As the network's width increases, the network's learning trajectory is shown to be well captured by a meaningful and dynamically nonlinear limit (the \textit{mean field} limit), which is characterized by a system of ODEs. Our framework applies to a broad range of network architectures, learning dynamics and network initializations. Central to the framework is the new idea of a \textit{neuronal embedding}, which comprises of a non-evolving probability space that allows to embed neural networks of arbitrary widths. We demonstrate two applications of our framework. Firstly the framework gives a principled way to study the simplifying effects that independent and identically distributed initializations have on the mean field limit. Secondly we prove a global convergence guarantee for two-layer and three-layer networks. Unlike previous works that rely on convexity, our result requires a certain universal approximation property, which is a distinctive feature of infinite-width neural networks. To the best of our knowledge, this is the first time global convergence is established for neural networks of more than two layers in the mean field regime.

Phan-Minh Nguyen | H. Pham

[1] Justin A. Sirignano,et al. Mean Field Analysis of Deep Neural Networks , 2019, Math. Oper. Res..

[2] Marco Mondelli,et al. Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks , 2019, ICML.

[3] Adel Javanmard,et al. Analysis of a Two-Layer Neural Network via Displacement Convexity , 2019, The Annals of Statistics.

[4] Konstantinos Spiliopoulos,et al. Mean Field Analysis of Neural Networks: A Law of Large Numbers , 2018, SIAM J. Appl. Math..

[5] R. Oliveira,et al. A mean-field limit for certain deep neural networks , 2019, 1906.00193.

[6] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.

[7] Andrea Montanari,et al. Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit , 2019, COLT.

[8] Phan-Minh Nguyen,et al. Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks , 2019, ArXiv.

[9] Joan Bruna,et al. Global convergence of neuron birth-death dynamics , 2019, ICML 2019.

[10] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.

[11] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.