Universal approximation bounds for superpositions of a sigmoidal function

Approximation properties of a class of artificial neural networks are established. It is shown that feedforward networks with one layer of sigmoidal nonlinearities achieve integrated squared error of order O(1/n), where n is the number of nodes. The approximated function is assumed to have a bound on the first moment of the magnitude distribution of the Fourier transform. The nonlinear parameters associated with the sigmoidal nodes, as well as the parameters of linear combination, are adjusted in the approximation. In contrast, it is shown that for series expansions with n terms, in which only the parameters of linear combination are adjusted, the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption, where d is the dimension of the input to the function. For the class of functions examined, the approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings. >

[1]  E. Stein,et al.  Introduction to Fourier Analysis on Euclidean Spaces. , 1971 .

[2]  G. Pisier Remarques sur un résultat non publié de B. Maurey , 1981 .

[3]  A. Pinkus n-Widths in Approximation Theory , 1985 .

[4]  A. Barron,et al.  Statistical properties of artificial neural networks , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[5]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[6]  Halbert White,et al.  Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings , 1990, Neural Networks.

[7]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[8]  L. Jones Constructive approximations for neural networks by sigmoidal functions , 1990, Proc. IEEE.

[9]  M. Bellare THE SPECTRAL NORM OF FINITE FUNCTIONS , 1991 .

[10]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[11]  Ying Zhao,et al.  Projection pursuit learning , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[12]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[13]  Jehoshua Bruck,et al.  On the Power of Threshold Circuits with Small Weights , 1991, SIAM J. Discret. Math..

[14]  Eyal Kushilevitz,et al.  Learning decision trees using the Fourier spectrum , 1991, STOC '91.

[15]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[16]  F. Girosi,et al.  Convergence Rates of Approximation by Translates , 1992 .

[17]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[18]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[19]  Vera Kurková,et al.  Kolmogorov's theorem and multilayer neural networks , 1992, Neural Networks.

[20]  C. Micchelli,et al.  Approximation by superposition of sigmoidal and radial basis functions , 1992 .

[21]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[22]  Eduardo D. Sontag,et al.  Rate of approximation results motivated by robust neural network learning , 1993, COLT '93.

[23]  G. Lugosi,et al.  Strong Universal Consistency of Neural Network Classifiers , 1993, Proceedings. IEEE International Symposium on Information Theory.

[24]  Leo Breiman,et al.  Hinging hyperplanes for regression, classification, and function approximation , 1993, IEEE Trans. Inf. Theory.

[25]  Kurt Hornik,et al.  Degree of Approximation Results for Feedforward Networks Approximating Unknown Mappings and Their Derivatives , 1994, Neural Computation.

[26]  L. K. Jones,et al.  Good weights and hyperbolic kernels for neural networks, projection pursuit, and pattern classification: Fourier strategies for extracting information from high-dimensional data , 1994, IEEE Trans. Inf. Theory.

[27]  Daniel F. McCaffrey,et al.  Convergence rates for single hidden layer feedforward networks , 1994, Neural Networks.