Global Convergence of Three-layer Neural Networks in the Mean Field Regime

In the mean field regime, neural networks are appropriately scaled so that as the width tends to infinity, the learning dynamics tends to a nonlinear and nontrivial dynamical limit, known as the mean field limit. This lends a way to study large-width neural networks via analyzing the mean field limit. Recent works have successfully applied such analysis to two-layer networks and provided global convergence guarantees. The extension to multilayer ones however has been a highly challenging puzzle, and little is known about the optimization efficiency in the mean field regime when there are more than two layers. In this work, we prove a global convergence result for unregularized feedforward three-layer networks in the mean field regime. We first develop a rigorous framework to establish the mean field limit of three-layer networks under stochastic gradient descent training. To that end, we propose the idea of a neuronal embedding, which comprises of a fixed probability space that encapsulates neural networks of arbitrary sizes. The identified mean field limit is then used to prove a global convergence guarantee under suitable regularity and convergence mode assumptions, which – unlike previous works on two-layer networks – does not rely critically on convexity. Underlying the result is a universal approximation property, natural of neural networks, which importantly is shown to hold at any finite training time (not necessarily at convergence) via an algebraic topology argument.

[1]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[2]  Francis Bach,et al.  On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.

[3]  Adel Javanmard,et al.  Analysis of a Two-Layer Neural Network via Displacement Convexity , 2019, The Annals of Statistics.

[4]  Jaehoon Lee,et al.  Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.

[5]  Andrea Montanari,et al.  Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit , 2019, COLT.

[6]  Taiji Suzuki,et al.  Stochastic Particle Gradient Descent for Infinite Ensembles , 2017, ArXiv.

[7]  Phan-Minh Nguyen,et al.  A Note on the Global Convergence of Multilayer Neural Networks in the Mean Field Regime , 2020, ArXiv.

[8]  Joan Bruna,et al.  Neuron birth-death dynamics accelerates gradient descent and converges asymptotically , 2019, ICML.

[9]  Marco Mondelli,et al.  Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks , 2019, ICML.

[10]  Cory Stephenson,et al.  On the geometry of generalization and memorization in deep neural networks , 2021, ICLR 2021 Poster.

[11]  Colin Wei,et al.  Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.

[12]  Stephan Wojtowytsch,et al.  On the Convergence of Gradient Descent Training for Two-layer ReLU-networks in the Mean Field Regime , 2020, ArXiv.

[13]  Nicolas Le Roux,et al.  Convex Neural Networks , 2005, NIPS.

[14]  Hong Chen,et al.  Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems , 1995, IEEE Trans. Neural Networks.

[15]  Francis Bach,et al.  On Lazy Training in Differentiable Programming , 2018, NeurIPS.

[16]  Barnabás Póczos,et al.  Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[17]  Jan Vondrák,et al.  Generalization Bounds for Uniformly Stable Algorithms , 2018, NeurIPS.

[18]  Phan-Minh Nguyen,et al.  Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks , 2019, ArXiv.

[19]  Phan-Minh Nguyen,et al.  A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks , 2020, ArXiv.

[20]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[21]  Jianfeng Lu,et al.  A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth , 2020, ICML.