Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth

A new network with super approximation power is introduced. This network is built with Floor ($\lfloor x\rfloor$) or ReLU ($\max\{0,x\}$) activation function in each neuron and hence we call such networks as Floor-ReLU networks. {For any hyper-parameters $N\in\mathbb{N}^+$ and $L\in\mathbb{N}^+$,} it is shown that Floor-ReLU networks with width $\max\{d,\, 5N+13\}$ and depth $64dL+3$ can {uniformly approximate a H{o}lder function $f$ on $[0,1]^d$ with an approximation rate $3\lambda d^{\alpha/2}N^{-\alpha\sqrt{L}}$, where $\alpha \in(0,1]$ and $\lambda$ are the H{o}lder order and constant, respectively.} More generally for an arbitrary continuous function $f$ on $[0,1]^d$ with a modulus of continuity $\omega_f(\cdot)$, the constructive approximation rate is $\omega_f(\sqrt{d}\,N^{-\sqrt{L}})+2\omega_f(\sqrt{d}){N^{-\sqrt{L}}}$. As a consequence, this new {class of networks} overcome the curse of dimensionality in approximation power when the variation of $\omega_f(r)$ as $r\rightarrow 0$ is moderate (e.g., $\omega_f(r)\le r^\alpha$ for H{o}lder continuous functions), since the major term to be concerned in our approximation rate is {essentially $\sqrt{d}$ times a function of $N$ and $L$ independent of $d$ within the modulus of continuity. }

[1]  Dmitry Yarotsky,et al.  Optimal approximation of continuous functions by very deep ReLU networks , 2018, COLT.

[2]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[3]  Ruosong Wang,et al.  Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.

[4]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[5]  Amos Ron,et al.  Approximation using scattered shifts of a multivariate function , 2008, 0802.2517.

[6]  E Weinan,et al.  Representation formulas and pointwise properties for Barron functions , 2020, ArXiv.

[7]  Philipp Petersen,et al.  Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.

[8]  E. Weinan,et al.  A Priori Estimates of the Population Risk for Residual Networks , 2019, ArXiv.

[9]  F. Cao,et al.  The rate of approximation of Gaussian radial basis neural networks in continuous function space , 2013 .

[10]  Yuanzhi Li,et al.  Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.

[11]  Namig J. Guliyev,et al.  Approximation capability of two hidden layer feedforward neural networks with fixed weights , 2018, Neurocomputing.

[12]  Zuowei Shen,et al.  Deep Network Approximation for Smooth Functions , 2020, ArXiv.

[13]  Charles K. Chui,et al.  Construction of Neural Networks for Realization of Localized Deep Learning , 2018, Front. Appl. Math. Stat..

[14]  R. Srikant,et al.  Why Deep Neural Networks? , 2016, ArXiv.

[15]  Masaaki Imaizumi,et al.  Adaptive Approximation and Estimation of Deep Neural Network to Intrinsic Dimensionality , 2019, ArXiv.

[16]  Shijun Zhang,et al.  Nonlinear Approximation via Compositions , 2019, Neural Networks.

[17]  E Weinan,et al.  Exponential convergence of the deep neural network approximation for analytic functions , 2018, Science China Mathematics.

[18]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[19]  Amos Ron,et al.  Nonlinear approximation using Gaussian kernels , 2010 .

[20]  Matus Telgarsky,et al.  Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks , 2020, ICLR.

[21]  Zuowei Shen,et al.  Deep Learning via Dynamical Systems: An Approximation Perspective , 2019, Journal of the European Mathematical Society.

[22]  Haizhao Yang,et al.  Deep ReLU networks overcome the curse of dimensionality for bandlimited functions , 2019, 1903.00735.

[23]  Lingfeng Niu,et al.  Optimization Strategies in Quantized Neural Networks: A Review , 2019, 2019 International Conference on Data Mining Workshops (ICDMW).

[24]  Lei Wu,et al.  Approximation Analysis of Convolutional Neural Networks , 2023, East Asian Journal on Applied Mathematics.

[25]  Allan Pinkus,et al.  Lower bounds for approximation by MLP neural networks , 1999, Neurocomputing.

[26]  R. Pinnau,et al.  A consensus-based model for global optimization and its mean-field limit , 2016, 1604.05648.

[27]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[28]  Arnulf Jentzen,et al.  Analysis of the generalization error: Empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations , 2018, SIAM J. Math. Data Sci..

[29]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[30]  Taiji Suzuki,et al.  Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality , 2018, ICLR.

[31]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[32]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[33]  Wu Lei A PRIORI ESTIMATES OF THE POPULATION RISK FOR TWO-LAYER NEURAL NETWORKS , 2020 .

[34]  Michael Griebel,et al.  On a Constructive Proof of Kolmogorov’s Superposition Theorem , 2009 .

[35]  Jack Xin,et al.  Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets , 2019, ICLR.

[36]  Haizhao Yang,et al.  Error bounds for deep ReLU networks using the Kolmogorov-Arnold superposition theorem , 2019, Neural Networks.

[37]  Dmitry Yarotsky,et al.  The phase diagram of approximation rates for deep neural networks , 2019, NeurIPS.

[38]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[39]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[40]  Peter L. Bartlett,et al.  Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.

[41]  Gitta Kutyniok,et al.  Error bounds for approximations with deep ReLU neural networks in $W^{s, p}$ norms , 2019, Analysis and Applications.

[42]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[43]  Rémi Gribonval,et al.  Approximation Spaces of Deep Neural Networks , 2019, Constructive Approximation.

[44]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[45]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[46]  Sungho Shin,et al.  Quantized Neural Networks: Characterization and Holistic Optimization , 2020, 2020 IEEE Workshop on Signal Processing Systems (SiPS).

[47]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[48]  Qiang Du,et al.  New error bounds for deep networks using sparse grids. , 2017, 1712.08688.

[49]  Yuan Cao,et al.  Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.

[50]  Boris Igelnik,et al.  Kolmogorov's spline network , 2003, IEEE Trans. Neural Networks.

[51]  V. Tikhomirov On the Representation of Continuous Functions of Several Variables as Superpositions of Continuous Functions of a Smaller Number of Variables , 1991 .

[52]  Qiang Du,et al.  New Error Bounds for Deep ReLU Networks Using Sparse Grids , 2017, SIAM J. Math. Data Sci..

[53]  V. Tikhomirov On the Representation of Continuous Functions of Several Variables as Superpositions of Continuous Functions of one Variable and Addition , 1991 .

[54]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[55]  Vera Kurková,et al.  Kolmogorov's theorem and multilayer neural networks , 1992, Neural Networks.

[56]  Tuo Zhao,et al.  Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds , 2019, NeurIPS.

[57]  Lei Wu,et al.  A Priori Estimates of the Generalization Error for Two-layer Neural Networks , 2018, Communications in Mathematical Sciences.

[58]  R. DeVore,et al.  Optimal nonlinear approximation , 1989 .

[59]  Tao Luo,et al.  Two-Layer Neural Networks for Partial Differential Equations: Optimization and Generalization Theory , 2020, ArXiv.

[60]  Ding-Xuan Zhou,et al.  Universality of Deep Convolutional Neural Networks , 2018, Applied and Computational Harmonic Analysis.

[61]  Yuan Cao,et al.  How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks? , 2019, ICLR.

[62]  Yang Liu,et al.  Two-Step Quantization for Low-bit Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  Lorenzo Rosasco,et al.  Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.

[64]  Shi Jin,et al.  A consensus-based global optimization method for high dimensional machine learning problems , 2019 .

[65]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[66]  Helmut Bölcskei,et al.  Optimal Approximation with Sparsely Connected Deep Neural Networks , 2017, SIAM J. Math. Data Sci..

[67]  Liang Chen,et al.  A note on the expressive power of deep rectified linear unit networks in high‐dimensional spaces , 2019, Mathematical Methods in the Applied Sciences.

[68]  Yang Wang,et al.  Approximation in shift-invariant spaces with deep ReLU neural networks , 2020, ArXiv.

[69]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[70]  Zuowei Shen,et al.  Deep Network Approximation Characterized by Number of Neurons , 2019, Communications in Computational Physics.

[71]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[72]  Abbas Mehrabian,et al.  Nearly-tight VC-dimension bounds for piecewise linear neural networks , 2017, COLT.