On deep learning as a remedy for the curse of dimensionality in nonparametric regression

Assuming that a smoothness condition and a suitable restriction on the structure of the regression function hold, it is shown that least squares estimates based on multilayer feedforward neural networks are able to circumvent the curse of dimensionality in nonparametric regression. The proof is based on new approximation results concerning multilayer feedforward neural networks with bounded weights and a bounded number of hidden neurons. The estimates are compared with various other approaches by using simulated data.

[1]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[2]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[3]  L. Devroye,et al.  Distribution-Free Consistency Results in Nonparametric Discrimination and Regression Function Estimation , 1980 .

[4]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[5]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[6]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[7]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[8]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[9]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[10]  Jan Mielniczuk,et al.  Consistency of multilayer perceptron regression estimators , 1993, Neural Networks.

[11]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[12]  W. Härdle,et al.  Optimal Smoothing in Single-index Models , 1993 .

[13]  Hrushikesh Narhar Mhaskar,et al.  Approximation properties of a multilayered feedforward artificial neural network , 1993, Adv. Comput. Math..

[14]  Daniel F. McCaffrey,et al.  Convergence rates for single hidden layer feedforward networks , 1994, Neural Networks.

[15]  C. J. Stone,et al.  The Use of Polynomial Splines and Their Tensor Products in Multivariate Function Estimation , 1994 .

[16]  Gábor Lugosi,et al.  Nonparametric estimation via empirical risk minimization , 1995, IEEE Trans. Inf. Theory.

[17]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[18]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[19]  Ah Chung Tsoi,et al.  Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results , 1998, Neural Networks.

[20]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[21]  Tomas Sauer,et al.  Polynomial interpolation in several variables , 2000, Adv. Comput. Math..

[22]  L. Montefusco,et al.  Radial basis functions for the multivariate interpolation of large scattered data sets , 2002 .

[23]  D. Ruppert,et al.  Penalized Spline Estimation for Partially Linear Single-Index Models , 2002 .

[24]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[25]  A. Krzyżak,et al.  Adaptive regression estimation with multilayer feedforward neural networks , 2005 .

[26]  Joel L. Horowitz,et al.  Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions , 2007, 0803.2999.

[27]  Yingcun Xia,et al.  Variable selection for the single‐index model , 2007 .

[28]  Adil M. Bagirov,et al.  Estimation of a Regression Function by Maxima of Minima of Linear Functions , 2009, IEEE Transactions on Information Theory.

[29]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[30]  Michael Kohler,et al.  Optimal global rates of convergence for noiseless regression estimation problems with adaptively chosen design , 2014, J. Multivar. Anal..

[31]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[32]  Thomas M. Stoker,et al.  Investigating Smooth Multiple Regression by the Method of Average Derivatives , 2015 .

[33]  A. Krzyżak,et al.  On estimation of surrogate models for high-dimensional computer experiments ∗ , 2016 .

[34]  T. Poggio,et al.  Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.

[35]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[36]  Adam Krzyżak,et al.  Nonparametric Regression Based on Hierarchical Interaction Models , 2017, IEEE Transactions on Information Theory.

[37]  Johannes Schmidt-Hieber,et al.  Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.