Nonlinear Approximation and (Deep) $$\mathrm {ReLU}$$ ReLU Networks

This article is concerned with the approximation and expressive powers of deep neural networks. This is an active research area currently producing many interesting papers. The results most commonly found in the literature prove that neural networks approximate functions with classical smoothness to the same accuracy as classical linear methods of approximation, e.g., approximation by polynomials or by piecewise polynomials on prescribed partitions. However, approximation by neural networks depending on n parameters is a form of nonlinear approximation and as such should be compared with other nonlinear methods such as variable knot splines or n-term approximation from dictionaries. The performance of neural networks in targeted applications such as machine learning indicate that they actually possess even greater approximation power than these traditional methods of nonlinear approximation. The main results of this article prove that this is indeed the case. This is done by exhibiting large classes of functions which can be efficiently captured by neural networks where classical nonlinear methods fall short of the task. The present article purposefully limits itself to studying the approximation of univariate functions by ReLU networks. Many generalizations to functions of several variables and other activation functions can be envisioned. However, even in this simplest of settings considered here, a theory that completely quantifies the approximation power of neural networks is still lacking.

[1]  Dany Leviatan,et al.  Wavelet compression and nonlinearn-widths , 1993, Adv. Comput. Math..

[2]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[3]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[4]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[5]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[6]  Lorenzo Rosasco,et al.  Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.

[7]  T. Poggio,et al.  Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.

[8]  R. DeVore,et al.  Optimal nonlinear approximation , 1989 .

[9]  Amit Daniely,et al.  Depth Separation for Neural Networks , 2017, COLT.

[10]  Helmut Bölcskei,et al.  Optimal Approximation with Sparsely Connected Deep Neural Networks , 2017, SIAM J. Math. Data Sci..

[11]  H. Mhaskar,et al.  Neural networks for localized approximation , 1994 .

[12]  Alexander Cloninger,et al.  Provable approximation properties for deep neural networks , 2015, ArXiv.

[13]  Paul C. Kainen,et al.  Approximation by neural networks is not continuous , 1999, Neurocomputing.

[14]  Masayoshi Hata,et al.  Weierstrass's function and chaos , 1983 .

[15]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[16]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[17]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[18]  E Weinan,et al.  Exponential convergence of the deep neural network approximation for analytic functions , 2018, Science China Mathematics.

[19]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[20]  F. Attneave,et al.  The Organization of Behavior: A Neuropsychological Theory , 1949 .

[21]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.