Rates of convergence for adaptive regression estimates with multiple hidden layer feedforward neural networks

We present a general bound on the expected L2 error of adaptive least squares estimates. By applying it to multiple hidden layer feedforward neural network regression function estimates we are able to obtain optimal (up to log factor) rates of convergence for Lipschitz classes and fast rates of convergence for some classes of regression functions such as additive functions

[1]  A. Barron,et al.  Statistical properties of artificial neural networks , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[2]  Michael Kohler,et al.  A bound on the expected maximal deviation of averages from their means , 2003 .

[3]  Jan Mielniczuk,et al.  Consistency of multilayer perceptron regression estimators , 1993, Neural Networks.

[4]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[5]  Gábor Lugosi,et al.  Nonparametric estimation via empirical risk minimization , 1995, IEEE Trans. Inf. Theory.

[6]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[7]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[8]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[9]  Daniel F. McCaffrey,et al.  Convergence rates for single hidden layer feedforward networks , 1994, Neural Networks.

[10]  Adam Krzyzak,et al.  Radial Basis Function Networks and Complexity Regularization in Function Learning , 2022 .

[11]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[12]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[13]  Hrushikesh Narhar Mhaskar,et al.  Approximation properties of a multilayered feedforward artificial neural network , 1993, Adv. Comput. Math..

[14]  Xin Li,et al.  Limitations of the approximation capabilities of neural networks with one hidden layer , 1996, Adv. Comput. Math..

[15]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[16]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[17]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[18]  M. Kohler Inequalities for uniform deviations of averages from expectations with applications to nonparametric regression , 2000 .

[19]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[20]  C. J. Stone,et al.  The Use of Polynomial Splines and Their Tensor Products in Multivariate Function Estimation , 1994 .

[21]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[22]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[23]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[24]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[25]  Kurt Hornik,et al.  Some new results on neural network approximation , 1993, Neural Networks.

[26]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[27]  H. Mhaskar,et al.  Neural networks for localized approximation , 1994 .

[28]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[29]  Michael Kohler,et al.  Nonparametric regression function estimation using interaction least squares splines and comlexity regularization , 1998 .

[30]  L. Györfi,et al.  A Distribution-Free Theory of Nonparametric Regression (Springer Series in Statistics) , 2002 .

[31]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .