论文信息 - Optimal Approximation Rate of ReLU Networks in terms of Width and Depth

Optimal Approximation Rate of ReLU Networks in terms of Width and Depth

Abstract This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth. It is proved by construction that ReLU networks with width O ( max ⁡ { d ⌊ N 1 / d ⌋ , N + 2 } ) and depth O ( L ) can approximate a Holder continuous function on [ 0 , 1 ] d with an approximation rate O ( λ d ( N 2 L 2 ln ⁡ N ) − α / d ) , where α ∈ ( 0 , 1 ] and λ > 0 are Holder order and constant, respectively. Such a rate is optimal up to a constant in terms of width and depth separately, while existing results are only nearly optimal without the logarithmic factor in the approximation rate. More generally, for an arbitrary continuous function f on [ 0 , 1 ] d , the approximation rate becomes O ( d ω f ( ( N 2 L 2 ln ⁡ N ) − 1 / d ) ) , where ω f ( ⋅ ) is the modulus of continuity. We also extend our analysis to any continuous function f on a bounded set. Particularly, if ReLU networks with depth 31 and width O ( N ) are used to approximate one-dimensional Lipschitz continuous functions on [ 0 , 1 ] with a Lipschitz constant λ > 0 , the approximation rate in terms of the total number of parameters, W = O ( N 2 ) , becomes O ( λ W ln ⁡ W ) , which has not been discovered in the literature for fixed-depth ReLU networks.

[1] R. Srikant,et al. Why Deep Neural Networks? , 2016, ArXiv.

[2] Dmitry Yarotsky,et al. Elementary superexpressive activations , 2021, ICML.

[3] Peter L. Bartlett,et al. Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.

[4] Haizhao Yang,et al. Deep ReLU networks overcome the curse of dimensionality for bandlimited functions , 2019, 1903.00735.

[5] Mengdi Wang,et al. Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python , 2019, J. Mach. Learn. Res..

[6] Liwei Wang,et al. The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[7] Jinchao Xu,et al. Optimal Approximation Rates and Metric Entropy of ReLU$^k$ and Cosine Networks , 2021, ArXiv.

[8] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[9] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[10] Shijun Zhang,et al. Nonlinear Approximation via Compositions , 2019, Neural Networks.

[11] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[12] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[13] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[14] H. Whitney. Analytic Extensions of Differentiable Functions Defined in Closed Sets , 1934 .

[15] Franco Scarselli,et al. On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[16] Philipp Petersen,et al. Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.

[17] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.

[18] E. Weinan,et al. A Priori Estimates of the Population Risk for Residual Networks , 2019, ArXiv.

[19] Stefanie Jegelka,et al. ResNet with one-neuron hidden layers is a Universal Approximator , 2018, NeurIPS.

[20] Liang Chen,et al. A note on the expressive power of deep rectified linear unit networks in high‐dimensional spaces , 2019, Mathematical Methods in the Applied Sciences.

[21] Haizhao Yang,et al. Neural Network Approximation: Three Hidden Layers Are Enough , 2020, Neural Networks.

[22] Dmitry Yarotsky,et al. The phase diagram of approximation rates for deep neural networks , 2019, NeurIPS.

[23] Wu Lei. A PRIORI ESTIMATES OF THE POPULATION RISK FOR TWO-LAYER NEURAL NETWORKS , 2020 .

[24] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.

[25] Jinchao Xu,et al. Approximation rates for neural networks with general activation functions , 2020, Neural Networks.

[26] E Weinan,et al. Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations , 2018, J. Mach. Learn. Res..

[27] Abbas Mehrabian,et al. Nearly-tight VC-dimension bounds for piecewise linear neural networks , 2017, COLT.

[28] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[29] Akito Sakurai. Tight Bounds for the VC-Dimension of Piecewise Polynomial Networks , 1998, NIPS.

[30] Zuowei Shen,et al. Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth , 2020, Neural Computation.

[31] E Weinan,et al. On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics , 2020, CSIAM Transactions on Applied Mathematics.

[32] Dmitry Yarotsky,et al. Optimal approximation of continuous functions by very deep ReLU networks , 2018, COLT.

[33] Zuowei Shen,et al. Deep Network Approximation Characterized by Number of Neurons , 2019, Communications in Computational Physics.

[34] Zuowei Shen,et al. Deep Network Approximation for Smooth Functions , 2020, ArXiv.

[35] P. Urysohn. Über die Mächtigkeit der zusammenhängenden Mengen , 1925 .

[36] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.

[37] Lei Wu,et al. A Priori Estimates of the Generalization Error for Two-layer Neural Networks , 2018, Communications in Mathematical Sciences.

[38] Zuowei Shen,et al. Deep Learning via Dynamical Systems: An Approximation Perspective , 2019, Journal of the European Mathematical Society.

[39] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[40] E Weinan,et al. A priori estimates for classification problems using neural networks , 2020, ArXiv.

[41] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[42] E Weinan,et al. Exponential convergence of the deep neural network approximation for analytic functions , 2018, Science China Mathematics.

[43] Dmitry Yarotsky,et al. Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[44] Yoshua Bengio,et al. Depth with Nonlinearity Creates No Bad Local Minima in ResNets , 2019, Neural Networks.