Deep Network Approximation with Discrepancy Being Reciprocal of Width to Power of Depth

A new network with super approximation power is introduced. This network is built with Floor ($\lfloor x\rfloor$) and ReLU ($\max\{0,x\}$) activation functions and hence we call such networks as Floor-ReLU networks. It is shown by construction that Floor-ReLU networks with width $\max\{d,\, 5N+13\}$ and depth $64dL+3$ can pointwise approximate a Lipschitz continuous function $f$ on $[0,1]^d$ with an exponential approximation rate $3\mu\sqrt{d}\,N^{-\sqrt{L}}$, where $\mu$ is the Lipschitz constant of $f$. More generally for an arbitrary continuous function $f$ on $[0,1]^d$ with a modulus of continuity $\omega_f(\cdot)$, the constructive approximation rate is $\omega_f(\sqrt{d}\,N^{-\sqrt{L}})+2\omega_f(\sqrt{d}){N^{-\sqrt{L}}}$. As a consequence, this new network overcomes the curse of dimensionality in approximation power since this approximation order is essentially $\sqrt{d}$ times a function of $N$ and $L$ independent of $d$.

[1]  Rémi Gribonval,et al.  Approximation Spaces of Deep Neural Networks , 2019, Constructive Approximation.

[2]  Lei Wu,et al.  Approximation Analysis of Convolutional Neural Networks , 2023, East Asian Journal on Applied Mathematics.

[3]  Zuowei Shen,et al.  Deep Learning via Dynamical Systems: An Approximation Perspective , 2019, Journal of the European Mathematical Society.

[4]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[5]  Jack Xin,et al.  Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets , 2019, ICLR.

[6]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[7]  Dmitry Yarotsky,et al.  The phase diagram of approximation rates for deep neural networks , 2019, NeurIPS.

[8]  Gitta Kutyniok,et al.  Error bounds for approximations with deep ReLU neural networks in $W^{s, p}$ norms , 2019, Analysis and Applications.

[9]  Philipp Petersen,et al.  Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.

[10]  Luca Maria Gambardella,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .

[11]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[12]  Haizhao Yang,et al.  Error bounds for deep ReLU networks using the Kolmogorov-Arnold superposition theorem , 2019, Neural Networks.

[13]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[14]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[15]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[16]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[17]  Charles K. Chui,et al.  Construction of Neural Networks for Realization of Localized Deep Learning , 2018, Front. Appl. Math. Stat..

[18]  Haizhao Yang,et al.  Deep ReLU networks overcome the curse of dimensionality for bandlimited functions , 2019, 1903.00735.

[19]  Liang Chen,et al.  A note on the expressive power of deep rectified linear unit networks in high‐dimensional spaces , 2019, Mathematical Methods in the Applied Sciences.

[20]  Yang Wang,et al.  Approximation in shift-invariant spaces with deep ReLU neural networks , 2020, ArXiv.

[21]  Masaaki Imaizumi,et al.  Adaptive Approximation and Estimation of Deep Neural Network to Intrinsic Dimensionality , 2019, ArXiv.

[22]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[23]  Qiang Du,et al.  New error bounds for deep networks using sparse grids. , 2017, 1712.08688.

[24]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[25]  Helmut Bölcskei,et al.  Optimal Approximation with Sparsely Connected Deep Neural Networks , 2017, SIAM J. Math. Data Sci..

[26]  James L. McClelland,et al.  Psychological and biological models , 1986 .

[27]  Shijun Zhang,et al.  Nonlinear Approximation via Compositions , 2019, Neural Networks.

[28]  Peter L. Bartlett,et al.  Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.

[29]  Lorenzo Rosasco,et al.  Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.

[30]  Tuo Zhao,et al.  Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds , 2019, NeurIPS.

[31]  Zuowei Shen,et al.  Deep Network Approximation Characterized by Number of Neurons , 2019, Communications in Computational Physics.

[32]  Abbas Mehrabian,et al.  Nearly-tight VC-dimension bounds for piecewise linear neural networks , 2017, COLT.

[33]  Dmitry Yarotsky,et al.  Optimal approximation of continuous functions by very deep ReLU networks , 2018, COLT.

[34]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[35]  F. Cao,et al.  The rate of approximation of Gaussian radial basis neural networks in continuous function space , 2013 .

[36]  Zuowei Shen,et al.  Deep Network Approximation for Smooth Functions , 2020, ArXiv.

[37]  E Weinan,et al.  Exponential convergence of the deep neural network approximation for analytic functions , 2018, Science China Mathematics.

[38]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[39]  Amos Ron,et al.  Approximation using scattered shifts of a multivariate function , 2008, 0802.2517.

[40]  Amos Ron,et al.  Nonlinear approximation using Gaussian kernels , 2010 .

[41]  Ding-Xuan Zhou,et al.  Universality of Deep Convolutional Neural Networks , 2018, Applied and Computational Harmonic Analysis.

[42]  Yang Liu,et al.  Two-Step Quantization for Low-bit Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Taiji Suzuki,et al.  Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality , 2018, ICLR.