On the optimality of neural-network approximation using incremental algorithms

The problem of approximating functions by neural networks using incremental algorithms is studied. For functions belonging to a rather general class, characterized by certain smoothness properties with respect to the L2 norm, we compute upper bounds on the approximation error where error is measured by the Lq norm, 1< or =q< or =infinity. These results extend previous work, applicable in the case q=2, and provide an explicit algorithm to achieve the derived approximation error rate. In the range q< or =2 near-optimal rates of convergence are demonstrated. A gap remains, however, with respect to a recently established lower bound in the case q>2, although the rates achieved are provably better than those obtained by optimal linear approximation. Extensions of the results from the L2 norm to Lp are also discussed. A further interesting conclusion from our results is that no loss of generality is suffered using networks with positive hidden-to-output weights. Moreover, explicit bounds on the size of the hidden-to-output weights are established, which are sufficient to guarantee the established convergence rates.

[1]  H. Müller,et al.  Local Polynomial Modeling and Its Applications , 1998 .

[2]  M. Nikolskii,et al.  Approximation of Functions of Several Variables and Embedding Theorems , 1971 .

[3]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[4]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[5]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[6]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[7]  H. Triebel Theory Of Function Spaces , 1983 .

[8]  John M. Danskin,et al.  Approximation of functions of several variables and imbedding theorems , 1975 .

[9]  H. N. Mhaskar,et al.  Neural Networks for Optimal Approximation of Smooth and Analytic Functions , 1996, Neural Computation.

[10]  S. Mallat A wavelet tour of signal processing , 1998 .

[11]  A. Pinkus n-Widths in Approximation Theory , 1985 .

[12]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[13]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[14]  S. Nikol,et al.  Approximation of Functions of Several Variables and Imbedding Theorems , 1975 .

[15]  Ron Meir,et al.  On the near optimality of the stochastic approximation of smooth functions by neural networks , 2000, Adv. Comput. Math..

[16]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization , 1997 .

[17]  Bernard Delyon,et al.  Accuracy analysis for wavelet approximations , 1995, IEEE Trans. Neural Networks.

[18]  I. Johnstone,et al.  Wavelet Shrinkage: Asymptopia? , 1995 .

[19]  Vladimir N. Temlyakov,et al.  The best m-term approximation and greedy algorithms , 1998, Adv. Comput. Math..

[20]  C. D. Boor,et al.  Spline approximation by quasiinterpolants , 1973 .

[21]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[22]  P. Petrushev Approximation by ridge functions and neural networks , 1999 .

[23]  Joel Ratsaby,et al.  On the Degree of Approximation by Manifolds of Finite Pseudo-Dimension , 1999 .

[24]  G. Wahba Spline models for observational data , 1990 .

[25]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[26]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[27]  Irwin W. Sandberg,et al.  A note on error bounds for approximation in inner product spaces , 1996 .

[28]  Peter Auer,et al.  Exponentially many local minima for single neurons , 1995, NIPS.

[29]  H. Triebel Interpolation Theory, Function Spaces, Differential Operators , 1978 .

[30]  V. Maiorov On Best Approximation by Ridge Functions , 1999 .

[31]  Irwin W. Sandberg,et al.  Note on error bounds for function approximation using nonlinear networks , 1999 .

[32]  Hong Chen,et al.  Approximation capability in C(R¯n) by multilayer feedforward networks and related problems , 1995, IEEE Trans. Neural Networks.

[33]  M. Solomjak,et al.  Quantitative analysis in Sobolev imbedding theorems and applications to spectral theory , 1980 .

[34]  Peter L. Bartlett,et al.  Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.

[35]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[36]  C. Darken,et al.  Constructive Approximation Rates of Convex Approximation in Non-hilbert Spaces , 2022 .

[37]  George G. Lorentz,et al.  Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[38]  A. Kufner,et al.  Triebel, H., Interpolation Theory, Function Spaces, Differential Operators. Berlin, VEB Deutscher Verlag der Wissenschaften 1978. 528 S., M 87,50 , 1979 .