High-Order Approximation Rates for Shallow Neural Networks with Cosine and ReLU Activation Functions

We study the approximation properties of shallow neural networks with an activation function which is a power of the rectified linear unit. Specifically, we consider the dependence of the approximation rate on the dimension and the smoothness in the spectral Barron space of the underlying function f to be approximated. We show that as the smoothness index s of f increases, shallow neural networks with ReLU activation function obtain an improved approximation rate up to a best possible rate of O(n−(k+1) log(n)) in L2, independent of the dimension d. The significance of this result is that the activation function ReLU is fixed independent of the dimension, while for classical methods the degree of polynomial approximation or the smoothness of the wavelets used would have to increase in order to take advantage of the dimension dependent smoothness of f . In addition, we derive improved approximation rates for shallow neural networks with cosine activation function on the spectral Barron space. Finally, we prove lower bounds showing that the approximation rates attained are optimal under the given assumptions.

[1]  Kurt Hornik,et al.  Degree of Approximation Results for Feedforward Networks Approximating Unknown Mappings and Their Derivatives , 1994, Neural Computation.

[2]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[3]  Jinchao Xu,et al.  Improved Convergence Rates for the Orthogonal Greedy Algorithm , 2021, ArXiv.

[4]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[5]  Vitaly Maiorov,et al.  On the Best Approximation by Ridge Functions in the Uniform Norm , 2001 .

[6]  Dmitry Yarotsky,et al.  The phase diagram of approximation rates for deep neural networks , 2019, NeurIPS.

[7]  Lei Wu,et al.  Barron Spaces and the Compositional Function Spaces for Neural Network Models , 2019, ArXiv.

[8]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[9]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[10]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[11]  Larry L. Schumaker,et al.  Spline functions on triangulations , 2007, Encyclopedia of mathematics and its applications.

[12]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[13]  Jinchao Xu,et al.  Sharp Bounds on the Approximation Rates, Metric Entropy, and $n$-widths of Shallow Neural Networks , 2021, 2101.12365.

[14]  Jinchao Xu The Finite Neuron Method and Convergence Analysis , 2020, Communications in Computational Physics.

[15]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[16]  Wu Lei A PRIORI ESTIMATES OF THE POPULATION RISK FOR TWO-LAYER NEURAL NETWORKS , 2020 .

[17]  V. Maiorov,et al.  Best approximation by ridge functions in Lp-spaces , 2010 .

[18]  G. Petrova,et al.  Nonlinear Approximation and (Deep) ReLU\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {ReLU}$$\end{document} , 2019, Constructive Approximation.

[19]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[20]  Marcello Sanguineti,et al.  Bounds on rates of variable-basis and neural-network approximation , 2001, IEEE Trans. Inf. Theory.

[21]  J. Bramble,et al.  Triangular elements in the finite element method , 1970 .

[22]  Mircea D. Farcas,et al.  About Bernstein polynomials , 2008 .

[23]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[24]  Y. Makovoz Random Approximants and Neural Networks , 1996 .

[25]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[26]  Min Wang,et al.  A Priori Generalization Analysis of the Deep Ritz Method for Solving High Dimensional Elliptic Equations , 2021, ArXiv.

[27]  Kurt Hornik,et al.  Some new results on neural network approximation , 1993, Neural Networks.

[28]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[29]  Andrew R. Barron,et al.  Approximation by Combinations of ReLU and Squared ReLU Ridge Functions With $\ell^1$ and $\ell^0$ Controls , 2016, IEEE Transactions on Information Theory.

[30]  Eugene Lavretsky,et al.  On the geometric convergence of neural approximations , 2002, IEEE Trans. Neural Networks.

[31]  G. Pisier Remarques sur un résultat non publié de B. Maurey , 1981 .

[32]  Jinchao Xu,et al.  Characterization of the Variation Spaces Corresponding to Shallow Neural Networks , 2021, ArXiv.

[33]  Gilles Pagès,et al.  Approximations of Functions by a Multilayer Perceptron: a New Approach , 1997, Neural Networks.

[34]  Paul C. Kainen,et al.  Quasiorthogonal dimension of euclidean spaces , 1993 .

[35]  Renato Spigler,et al.  Approximation results for neural network operators activated by sigmoidal functions , 2013, Neural Networks.

[36]  V. Maiorov On Best Approximation by Ridge Functions , 1999 .

[37]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[38]  Dmitry Yarotsky,et al.  Optimal approximation of continuous functions by very deep ReLU networks , 2018, COLT.

[39]  Guy Bresler,et al.  Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth , 2020, NeurIPS.

[40]  Marcello Sanguineti,et al.  Geometric Upper Bounds on Rates of Variable-Basis Approximation , 2008, IEEE Transactions on Information Theory.

[41]  George G. Lorentz,et al.  Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[42]  P. Petrushev Approximation by ridge functions and neural networks , 1999 .

[43]  Jinchao Xu,et al.  Approximation rates for neural networks with general activation functions , 2020, Neural Networks.

[44]  Bo Li,et al.  Better Approximations of High Dimensional Smooth Functions by Deep Neural Networks with Rectified Power Units , 2019, Communications in Computational Physics.

[45]  L. R. Scott,et al.  The Mathematical Theory of Finite Element Methods , 1994 .

[46]  A. Ženíšek Interpolation polynomials on the triangle , 1970 .

[47]  D. W. Scharpf,et al.  The TUBA Family of Plate Elements for the Matrix Displacement Method , 1968, The Aeronautical Journal (1968).

[48]  S. W. Ellacott,et al.  Aspects of the numerical analysis of neural networks , 1994, Acta Numerica.

[49]  A. Bonato,et al.  Graphs and Hypergraphs , 2022 .

[50]  A. Barron,et al.  Approximation and learning by greedy algorithms , 2008, 0803.1718.

[51]  Jinchao Xu,et al.  Lower bounds of the discretization error for piecewise polynomials , 2013, Math. Comput..

[52]  H. Bungartz,et al.  Sparse grids , 2004, Acta Numerica.

[53]  Renato Spigler,et al.  Approximation by series of sigmoidal functions with applications to neural networks , 2015 .