Optimal approximation of continuous functions by very deep ReLU networks

We consider approximations of general continuous functions on finite-dimensional cubes by general deep ReLU neural networks and study the approximation rates with respect to the modulus of continuity of the function and the total number of weights $W$ in the network. We establish the complete phase diagram of feasible approximation rates and show that it includes two distinct phases. One phase corresponds to slower approximations that can be achieved with constant-depth networks and continuous weight assignments. The other phase provides faster approximations at the cost of depths necessarily growing as a power law $L\sim W^{\alpha}, 0<\alpha\le 1,$ and with necessarily discontinuous weight assignments. In particular, we prove that constant-width fully-connected networks of depth $L\sim W$ provide the fastest possible approximation rate $\|f-\widetilde f\|_\infty = O(\omega_f(O(W^{-2/\nu})))$ that cannot be achieved with less deep networks.

[1]  Akito Sakurai Tight Bounds for the VC-Dimension of Piecewise Polynomial Networks , 1998, NIPS.

[2]  Ohad Shamir,et al.  Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks , 2016, ICML.

[3]  R. Srikant,et al.  Why Deep Neural Networks for Function Approximation? , 2016, ICLR.

[4]  Philipp Petersen,et al.  Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.

[5]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[6]  Helmut Bölcskei,et al.  Memory-optimal neural network approximation , 2017, Optical Engineering + Applications.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Paul W. Goldberg,et al.  Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers , 1993, COLT '93.

[9]  R. Srikant,et al.  Why Deep Neural Networks? , 2016, ArXiv.

[10]  Tomaso Poggio,et al.  Learning Functions: When Is Deep Better Than Shallow , 2016, 1603.00988.

[11]  R. DeVore,et al.  Optimal nonlinear approximation , 1989 .

[12]  Peter L. Bartlett,et al.  Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..

[13]  Liwei Wang,et al.  The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[14]  Franco Scarselli,et al.  On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Matus Telgarsky,et al.  Benefits of Depth in Neural Networks , 2016, COLT.

[16]  Johannes Schmidt-Hieber,et al.  Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.

[17]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[18]  Tomaso A. Poggio,et al.  Learning Real and Boolean Functions: When Is Deep Better Than Shallow , 2016, ArXiv.

[19]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[20]  Boris Hanin,et al.  Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations , 2017, Mathematics.

[21]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[22]  Mark Sellke,et al.  Approximating Continuous Functions by ReLU Nets of Minimal Width , 2017, ArXiv.

[23]  Peter L. Bartlett,et al.  Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.

[24]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[25]  Paul C. Kainen,et al.  Approximation by neural networks is not continuous , 1999, Neurocomputing.