论文信息 - Deep Network Approximation with Discrepancy Being Reciprocal of Width to Power of Depth

Deep Network Approximation with Discrepancy Being Reciprocal of Width to Power of Depth

A new network with super approximation power is introduced. This network is built with Floor ($\lfloor x\rfloor$) and ReLU ($\max\{0,x\}$) activation functions and hence we call such networks as Floor-ReLU networks. It is shown by construction that Floor-ReLU networks with width $\max\{d,\, 5N+13\}$ and depth $64dL+3$ can pointwise approximate a Lipschitz continuous function $f$ on $[0,1]^d$ with an exponential approximation rate $3\mu\sqrt{d}\,N^{-\sqrt{L}}$, where $\mu$ is the Lipschitz constant of $f$. More generally for an arbitrary continuous function $f$ on $[0,1]^d$ with a modulus of continuity $\omega_f(\cdot)$, the constructive approximation rate is $\omega_f(\sqrt{d}\,N^{-\sqrt{L}})+2\omega_f(\sqrt{d}){N^{-\sqrt{L}}}$. As a consequence, this new network overcomes the curse of dimensionality in approximation power since this approximation order is essentially $\sqrt{d}$ times a function of $N$ and $L$ independent of $d$.

[1] Rémi Gribonval,et al. Approximation Spaces of Deep Neural Networks , 2019, Constructive Approximation.

[2] Lei Wu,et al. Approximation Analysis of Convolutional Neural Networks , 2023, East Asian Journal on Applied Mathematics.

[3] Zuowei Shen,et al. Deep Learning via Dynamical Systems: An Approximation Perspective , 2019, Journal of the European Mathematical Society.

[4] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[5] Jack Xin,et al. Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets , 2019, ICLR.

[6] R. DeVore,et al. Nonlinear approximation , 1998, Acta Numerica.

[7] Dmitry Yarotsky,et al. The phase diagram of approximation rates for deep neural networks , 2019, NeurIPS.

[8] Gitta Kutyniok,et al. Error bounds for approximations with deep ReLU neural networks in $W^{s, p}$ norms , 2019, Analysis and Applications.

[9] Philipp Petersen,et al. Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.

[10] Luca Maria Gambardella,et al. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .

[11] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[12] Haizhao Yang,et al. Error bounds for deep ReLU networks using the Kolmogorov-Arnold superposition theorem , 2019, Neural Networks.

[13] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[14] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[15] Dmitry Yarotsky,et al. Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[16] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[17] Charles K. Chui,et al. Construction of Neural Networks for Realization of Localized Deep Learning , 2018, Front. Appl. Math. Stat..

[18] Haizhao Yang,et al. Deep ReLU networks overcome the curse of dimensionality for bandlimited functions , 2019, 1903.00735.

[19] Liang Chen,et al. A note on the expressive power of deep rectified linear unit networks in high‐dimensional spaces , 2019, Mathematical Methods in the Applied Sciences.

[20] Yang Wang,et al. Approximation in shift-invariant spaces with deep ReLU neural networks , 2020, ArXiv.

[21] Masaaki Imaizumi,et al. Adaptive Approximation and Estimation of Deep Neural Network to Intrinsic Dimensionality , 2019, ArXiv.

[22] Sven Behnke,et al. Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[23] Qiang Du,et al. New error bounds for deep networks using sparse grids. , 2017, 1712.08688.

[24] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[25] Helmut Bölcskei,et al. Optimal Approximation with Sparsely Connected Deep Neural Networks , 2017, SIAM J. Math. Data Sci..

[26] James L. McClelland,et al. Psychological and biological models , 1986 .

[27] Shijun Zhang,et al. Nonlinear Approximation via Compositions , 2019, Neural Networks.

[28] Peter L. Bartlett,et al. Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.

[29] Lorenzo Rosasco,et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.

[30] Tuo Zhao,et al. Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds , 2019, NeurIPS.

[31] Zuowei Shen,et al. Deep Network Approximation Characterized by Number of Neurons , 2019, Communications in Computational Physics.

[32] Abbas Mehrabian,et al. Nearly-tight VC-dimension bounds for piecewise linear neural networks , 2017, COLT.

[33] Dmitry Yarotsky,et al. Optimal approximation of continuous functions by very deep ReLU networks , 2018, COLT.

[34] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[35] F. Cao,et al. The rate of approximation of Gaussian radial basis neural networks in continuous function space , 2013 .

[36] Zuowei Shen,et al. Deep Network Approximation for Smooth Functions , 2020, ArXiv.

[37] E Weinan,et al. Exponential convergence of the deep neural network approximation for analytic functions , 2018, Science China Mathematics.

[38] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[39] Amos Ron,et al. Approximation using scattered shifts of a multivariate function , 2008, 0802.2517.

[40] Amos Ron,et al. Nonlinear approximation using Gaussian kernels , 2010 .

[41] Ding-Xuan Zhou,et al. Universality of Deep Convolutional Neural Networks , 2018, Applied and Computational Harmonic Analysis.

[42] Yang Liu,et al. Two-Step Quantization for Low-bit Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] Taiji Suzuki,et al. Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality , 2018, ICLR.