论文信息 - Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth

Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth

A new network with super approximation power is introduced. This network is built with Floor ($\lfloor x\rfloor$) or ReLU ($\max\{0,x\}$) activation function in each neuron and hence we call such networks as Floor-ReLU networks. {For any hyper-parameters $N\in\mathbb{N}^+$ and $L\in\mathbb{N}^+$,} it is shown that Floor-ReLU networks with width $\max\{d,\, 5N+13\}$ and depth $64dL+3$ can {uniformly approximate a H{o}lder function $f$ on $[0,1]^d$ with an approximation rate $3\lambda d^{\alpha/2}N^{-\alpha\sqrt{L}}$, where $\alpha \in(0,1]$ and $\lambda$ are the H{o}lder order and constant, respectively.} More generally for an arbitrary continuous function $f$ on $[0,1]^d$ with a modulus of continuity $\omega_f(\cdot)$, the constructive approximation rate is $\omega_f(\sqrt{d}\,N^{-\sqrt{L}})+2\omega_f(\sqrt{d}){N^{-\sqrt{L}}}$. As a consequence, this new {class of networks} overcome the curse of dimensionality in approximation power when the variation of $\omega_f(r)$ as $r\rightarrow 0$ is moderate (e.g., $\omega_f(r)\le r^\alpha$ for H{o}lder continuous functions), since the major term to be concerned in our approximation rate is {essentially $\sqrt{d}$ times a function of $N$ and $L$ independent of $d$ within the modulus of continuity. }

[1] Dmitry Yarotsky,et al. Optimal approximation of continuous functions by very deep ReLU networks , 2018, COLT.

[2] John A. Nelder,et al. A Simplex Method for Function Minimization , 1965, Comput. J..

[3] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.

[4] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[5] Amos Ron,et al. Approximation using scattered shifts of a multivariate function , 2008, 0802.2517.

[6] E Weinan,et al. Representation formulas and pointwise properties for Barron functions , 2020, ArXiv.

[7] Philipp Petersen,et al. Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.

[8] E. Weinan,et al. A Priori Estimates of the Population Risk for Residual Networks , 2019, ArXiv.

[9] F. Cao,et al. The rate of approximation of Gaussian radial basis neural networks in continuous function space , 2013 .

[10] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.

[11] Namig J. Guliyev,et al. Approximation capability of two hidden layer feedforward neural networks with fixed weights , 2018, Neurocomputing.

[12] Zuowei Shen,et al. Deep Network Approximation for Smooth Functions , 2020, ArXiv.

[13] Charles K. Chui,et al. Construction of Neural Networks for Realization of Localized Deep Learning , 2018, Front. Appl. Math. Stat..

[14] R. Srikant,et al. Why Deep Neural Networks? , 2016, ArXiv.

[15] Masaaki Imaizumi,et al. Adaptive Approximation and Estimation of Deep Neural Network to Intrinsic Dimensionality , 2019, ArXiv.

[16] Shijun Zhang,et al. Nonlinear Approximation via Compositions , 2019, Neural Networks.

[17] E Weinan,et al. Exponential convergence of the deep neural network approximation for analytic functions , 2018, Science China Mathematics.

[18] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[19] Amos Ron,et al. Nonlinear approximation using Gaussian kernels , 2010 .

[20] Matus Telgarsky,et al. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks , 2020, ICLR.

[21] Zuowei Shen,et al. Deep Learning via Dynamical Systems: An Approximation Perspective , 2019, Journal of the European Mathematical Society.

[22] Haizhao Yang,et al. Deep ReLU networks overcome the curse of dimensionality for bandlimited functions , 2019, 1903.00735.

[23] Lingfeng Niu,et al. Optimization Strategies in Quantized Neural Networks: A Review , 2019, 2019 International Conference on Data Mining Workshops (ICDMW).

[24] Lei Wu,et al. Approximation Analysis of Convolutional Neural Networks , 2023, East Asian Journal on Applied Mathematics.

[25] Allan Pinkus,et al. Lower bounds for approximation by MLP neural networks , 1999, Neurocomputing.

[26] R. Pinnau,et al. A consensus-based model for global optimization and its mean-field limit , 2016, 1604.05648.

[27] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[28] Arnulf Jentzen,et al. Analysis of the generalization error: Empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations , 2018, SIAM J. Math. Data Sci..

[29] Riccardo Poli,et al. Particle swarm optimization , 1995, Swarm Intelligence.

[30] Taiji Suzuki,et al. Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality , 2018, ICLR.

[31] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[32] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[33] Wu Lei. A PRIORI ESTIMATES OF THE POPULATION RISK FOR TWO-LAYER NEURAL NETWORKS , 2020 .

[34] Michael Griebel,et al. On a Constructive Proof of Kolmogorov’s Superposition Theorem , 2009 .

[35] Jack Xin,et al. Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets , 2019, ICLR.

[36] Haizhao Yang,et al. Error bounds for deep ReLU networks using the Kolmogorov-Arnold superposition theorem , 2019, Neural Networks.

[37] Dmitry Yarotsky,et al. The phase diagram of approximation rates for deep neural networks , 2019, NeurIPS.

[38] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[39] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[40] Peter L. Bartlett,et al. Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.

[41] Gitta Kutyniok,et al. Error bounds for approximations with deep ReLU neural networks in $W^{s, p}$ norms , 2019, Analysis and Applications.

[42] Goldberg,et al. Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[43] Rémi Gribonval,et al. Approximation Spaces of Deep Neural Networks , 2019, Constructive Approximation.

[44] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[45] Dmitry Yarotsky,et al. Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[46] Sungho Shin,et al. Quantized Neural Networks: Characterization and Holistic Optimization , 2020, 2020 IEEE Workshop on Signal Processing Systems (SiPS).

[47] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[48] Qiang Du,et al. New error bounds for deep networks using sparse grids. , 2017, 1712.08688.

[49] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.

[50] Boris Igelnik,et al. Kolmogorov's spline network , 2003, IEEE Trans. Neural Networks.

[51] V. Tikhomirov. On the Representation of Continuous Functions of Several Variables as Superpositions of Continuous Functions of a Smaller Number of Variables , 1991 .

[52] Qiang Du,et al. New Error Bounds for Deep ReLU Networks Using Sparse Grids , 2017, SIAM J. Math. Data Sci..

[53] V. Tikhomirov. On the Representation of Continuous Functions of Several Variables as Superpositions of Continuous Functions of one Variable and Addition , 1991 .

[54] Dan Boneh,et al. On genetic algorithms , 1995, COLT '95.

[55] Vera Kurková,et al. Kolmogorov's theorem and multilayer neural networks , 1992, Neural Networks.

[56] Tuo Zhao,et al. Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds , 2019, NeurIPS.

[57] Lei Wu,et al. A Priori Estimates of the Generalization Error for Two-layer Neural Networks , 2018, Communications in Mathematical Sciences.

[58] R. DeVore,et al. Optimal nonlinear approximation , 1989 .

[59] Tao Luo,et al. Two-Layer Neural Networks for Partial Differential Equations: Optimization and Generalization Theory , 2020, ArXiv.

[60] Ding-Xuan Zhou,et al. Universality of Deep Convolutional Neural Networks , 2018, Applied and Computational Harmonic Analysis.

[61] Yuan Cao,et al. How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks? , 2019, ICLR.

[62] Yang Liu,et al. Two-Step Quantization for Low-bit Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63] Lorenzo Rosasco,et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.

[64] Shi Jin,et al. A consensus-based global optimization method for high dimensional machine learning problems , 2019 .

[65] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[66] Helmut Bölcskei,et al. Optimal Approximation with Sparsely Connected Deep Neural Networks , 2017, SIAM J. Math. Data Sci..

[67] Liang Chen,et al. A note on the expressive power of deep rectified linear unit networks in high‐dimensional spaces , 2019, Mathematical Methods in the Applied Sciences.

[68] Yang Wang,et al. Approximation in shift-invariant spaces with deep ReLU neural networks , 2020, ArXiv.

[69] Sven Behnke,et al. Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[70] Zuowei Shen,et al. Deep Network Approximation Characterized by Number of Neurons , 2019, Communications in Computational Physics.

[71] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[72] Abbas Mehrabian,et al. Nearly-tight VC-dimension bounds for piecewise linear neural networks , 2017, COLT.