A quantitative functional central limit theorem for shallow neural networks

We prove a Quantitative Functional Central Limit Theorem for one-hidden-layer neural networks with generic activation function. The rates of convergence that we establish depend heavily on the smoothness of the activation function, and they range from logarithmic in non-differentiable cases such as the Relu to $\sqrt{n}$ for very regular activations. Our main tools are functional versions of the Stein-Malliavin approach; in particular, we exploit heavily a quantitative functional central limit theorem which has been recently established by Bourguin and Campese (2020).

[1]  S. Favaro,et al.  Non-asymptotic approximations of Gaussian neural networks via second-order Poincaré inequalities , 2023, ArXiv.

[2]  Dario Trevisan,et al.  Quantitative Gaussian Approximation of Randomly Initialized Deep Neural Networks , 2022, ArXiv.

[3]  Adam Klukowski,et al.  Rate of Convergence of Polynomial Networks to Gaussian Processes , 2021, COLT.

[4]  Boris Hanin,et al.  Random Neural Networks in the Infinite Width Limit as Gaussian Processes , 2021, The Annals of Applied Probability.

[5]  Daniel A. Roberts,et al.  The Principles of Deep Learning Theory , 2021, ArXiv.

[6]  G. Peccati,et al.  The multivariate functional de Jong CLT , 2021, Probability Theory and Related Fields.

[7]  Tselil Schramm,et al.  Non-asymptotic approximations of neural networks by Gaussian processes , 2021, COLT.

[8]  Francis Bach,et al.  Deep Equals Shallow for ReLU Networks in Kernel Regimes , 2020, ICLR.

[9]  S. Bourguin,et al.  Approximation of Hilbert-Valued Gaussians on Dirichlet structures , 2020, Electronic Journal of Probability.

[10]  Adam R. Klivans,et al.  Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals , 2019, NeurIPS.

[11]  Sho Yaida,et al.  Non-Gaussian processes and neural networks at finite widths , 2019, MSML.

[12]  M. Taqqu,et al.  Four moments theorems on Markov chaos , 2018, The Annals of Probability.

[13]  Yoram Singer,et al.  Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.

[14]  Andrew D. Barbour,et al.  Stein's Method , 2014 .

[15]  Ivan Nourdin,et al.  Stein’s method, logarithmic Sobolev and transport inequalities , 2014, Geometric and Functional Analysis.

[16]  G. Peccati,et al.  Normal Approximations with Malliavin Calculus: From Stein's Method to Universality , 2012 .

[17]  G. Peccati,et al.  Stein’s method on Wiener chaos , 2007, 0712.2940.

[18]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[19]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[20]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[21]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[22]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[23]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[24]  G. Peccati,et al.  Malliavin–Stein method: a survey of some recent developments , 2021, Modern Stochastics: Theory and Applications.

[25]  Radford M. Neal Priors for Infinite Networks , 1996 .

[26]  Radford M. Neal Bayesian learning for neural networks , 1995 .