PR ] 3 J un 2 01 9 Mean Field Analysis of Neural Networks : A Central Limit Theorem

We rigorously prove a central limit theorem for neural network models with a single hidden layer. The central limit theorem is proven in the asymptotic regime of simultaneously (A) large numbers of hidden units and (B) large numbers of stochastic gradient descent training iterations. Our result describes the neural network’s fluctuations around its mean-field limit. The fluctuations have a Gaussian distribution and satisfy a stochastic partial differential equation. The proof relies upon weak convergence methods from stochastic analysis. In particular, we prove relative compactness for the sequence of processes and uniqueness of the limiting process in a suitable Sobolev space.

[1]  D. Burkholder Distribution Function Inequalities for Martingales , 1973 .

[2]  Thomas G. Kurtz,et al.  Semigroups of Conditioned Shifts and Approximation of Markov Processes , 1975 .

[3]  D. Dawson Critical dynamics and fluctuations for a mean-field model of cooperative behavior , 1983 .

[4]  Francis Comets,et al.  Asymptotic dynamics, non-critical and critical fluctuations for a geometric long-range interacting model , 1988 .

[5]  Sommers,et al.  Chaos in random neural networks. , 1988, Physical review letters.

[6]  A. Sznitman Topics in propagation of chaos , 1991 .

[7]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[8]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[9]  F. Hollander,et al.  McKean-Vlasov limit for interacting random processes in random media , 1996 .

[10]  P. Protter,et al.  Weak convergence of stochastic integrals and differential equations , 1996 .

[11]  Sylvie Méléard,et al.  A Hilbertian approach for fluctuations on the McKean-Vlasov model , 1997 .

[12]  A. Gottlieb Markov Transitions and the Propagation of Chaos , 2000, math/0001076.

[13]  R. Fry,et al.  Smooth bump functions and the geometry of banach spaces , 2002 .

[14]  M. Samuelides,et al.  Large deviations and mean-field theory for asymmetric random recurrent neural networks , 2002 .

[15]  T. Kurtz,et al.  A stochastic evolution equation arising from the fluctuations of a class of interacting particle systems , 2004 .

[16]  S. Ethier,et al.  Markov Processes: Characterization and Convergence , 2005 .

[17]  S. A. van de Geer,et al.  Lectures on Empirical Processes: Theory and Statistical Applications , 2007 .

[18]  W. Runggaldier,et al.  Large portfolio losses: A dynamic contagion model , 2007, 0704.1348.

[19]  P. D. Pra,et al.  Heterogeneous credit portfolios and the dynamics of the aggregate losses , 2008, 0806.3399.

[20]  V. Kolokoltsov Nonlinear Markov Processes and Kinetic Equations , 2010 .

[21]  Justin A. Sirignano,et al.  LARGE PORTFOLIO ASYMPTOTICS FOR LOSS FROM DEFAULT , 2011, 1109.1272.

[22]  Justin A. Sirignano,et al.  Fluctuation Analysis for the Loss from Default , 2013, 1304.1420.

[23]  K. Spiliopoulos,et al.  Default clustering in large portfolios: Typical events. , 2011, 1104.1773.

[24]  J. Touboul Propagation of chaos in neural fields , 2011, 1108.2414.

[25]  F. Delarue,et al.  Particle systems with a singular mean-field self-excitation. Application to neuronal networks , 2014, 1406.1151.

[26]  Lijun Bo,et al.  Systemic Risk in Interbanking Networks , 2015, SIAM J. Financial Math..

[27]  D. Talay,et al.  Mean-Field Limit of a Stochastic Particle System Smoothly Interacting Through Threshold Hitting-Times and Applications to Neural Networks with Dendritic Component , 2014, SIAM J. Math. Anal..

[28]  Julien Chevallier Fluctuations for mean-field interacting age-dependent Hawkes processes , 2016, 1611.02008.

[29]  B. Hambly,et al.  A stochastic McKean--Vlasov equation for absorbing diffusions on the half-line , 2016, 1605.00669.

[30]  Yue M. Lu,et al.  Scaling Limit: Exact and Tractable Analysis of Online Learning Algorithms with Applications to Regularized Regression and PCA , 2017, ArXiv.

[31]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[32]  Grant M. Rotskoff,et al.  Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error , 2018, ArXiv.

[33]  Andrea Montanari,et al.  A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.