Nonlinear Approximation and (Deep) ReLU\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {ReLU}$$\end{document}

This article is concerned with the approximation and expressive powers of deep neural networks. This is an active research area currently producing many interesting papers. The results most commonly found in the literature prove that neural networks approximate functions with classical smoothness to the same accuracy as classical linear methods of approximation, e.g. approximation by polynomials or by piecewise polynomials on prescribed partitions. However, approximation by neural networks depending on n parameters is a form of nonlinear approximation and as such should be compared with other nonlinear methods such as variable knot splines or n-term approximation from dictionaries. The performance of neural networks in targeted applications such as machine learning indicate that they actually possess even greater approximation power than these traditional methods of nonlinear approximation. The main results of this article prove that this is indeed the case. This is done by exhibiting large classes of functions which can be efficiently captured by neural networks where classical nonlinear methods fall short of the task. The present article purposefully limits itself to studying the approximation of univariate functions by ReLU networks. Many generalizations to functions of several variables and other activation functions can be envisioned. However, even in this simplest of settings considered here, a theory that completely quantifies the approximation power of neural networks is still lacking.

[1]  R. Srikant,et al.  Why Deep Neural Networks for Function Approximation? , 2016, ICLR.

[2]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[3]  Mansoor I. Yousefi,et al.  Bounds on the Approximation Power of Feedforward Neural Networks , 2018, ICML.

[4]  Masayoshi Hata,et al.  Fractals in Mathematics , 1986 .

[5]  Dmitry Yarotsky Quantified advantage of discontinuous weight selection in approximations with deep neural networks , 2017, ArXiv.

[6]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[7]  E Weinan,et al.  Exponential convergence of the deep neural network approximation for analytic functions , 2018, Science China Mathematics.

[8]  Dany Leviatan,et al.  Wavelet compression and nonlinearn-widths , 1993, Adv. Comput. Math..

[9]  Helmut Bölcskei,et al.  Optimal Approximation with Sparsely Connected Deep Neural Networks , 2017, SIAM J. Math. Data Sci..

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Matus Telgarsky,et al.  Representation Benefits of Deep Feedforward Networks , 2015, ArXiv.

[12]  Masayoshi Hata,et al.  Weierstrass's function and chaos , 1983 .

[13]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[14]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[15]  R. DeVore,et al.  Optimal nonlinear approximation , 1989 .

[16]  George G. Lorentz,et al.  Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[17]  Paul C. Kainen,et al.  Approximation by neural networks is not continuous , 1999, Neurocomputing.

[18]  Alexander Cloninger,et al.  Provable approximation properties for deep neural networks , 2015, ArXiv.

[19]  Lorenzo Rosasco,et al.  Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.

[20]  Amit Daniely,et al.  Depth Separation for Neural Networks , 2017, COLT.

[21]  Ronald A. DeVore,et al.  Interpolation of linear operators on Sobolev spaces , 1979 .

[22]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[23]  H. Mhaskar,et al.  Neural networks for localized approximation , 1994 .

[24]  Kiko Kawamura,et al.  The Takagi function: a survey , 2011 .

[25]  Mark Sellke,et al.  Approximating Continuous Functions by ReLU Nets of Minimal Width , 2017, ArXiv.