Regularized Neural Networks: Some Convergence Rate Results

In a recent paper, Poggio and Girosi (1990) proposed a class of neural networks obtained from the theory of regularization. Regularized networks are capable of approximating arbitrarily well any continuous function on a compactum. In this paper we consider in detail the learning problem for the one-dimensional case. We show that in the case of output data observed with noise, regularized networks are capable of learning and approximating (on compacta) elements of certain classes of Sobolev spaces, known as reproducing kernel Hilbert spaces (RKHS), at a nonparametric rate that optimally exploits the smoothness properties of the unknown mapping. In particular we show that the total squared error, given by the sum of the squared bias and the variance, will approach zero at a rate of n(-2m)/(2m+1), where m denotes the order of differentiability of the true unknown function. On the other hand, if the unknown mapping is a continuous function but does not belong to an RKHS, then there still exists a unique regularized solution, but this is no longer guaranteed to converge in mean square to a well-defined limit. Further, even if such a solution converges, the total squared error is bounded away from zero for all n sufficiently large.

[1]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[2]  Halbert White,et al.  Approximating and learning unknown mappings using multilayer feedforward networks with bounded weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[3]  G. Judge,et al.  The Theory and Practice of Econometrics , 1981 .

[4]  Adam Krzyzak,et al.  On radial basis function nets and kernel regression: Statistical consistency, convergence rates, and receptive field size , 1994, Neural Networks.

[5]  Peter Schmidt,et al.  The Theory and Practice of Econometrics , 1985 .

[6]  D. Cox Asymptotics for $M$-Type Smoothing Splines , 1983 .

[7]  D. Cox MULTIVARIATE SMOOTHING SPLINE FUNCTIONS , 1984 .

[8]  B. Irie,et al.  Capabilities of three-layered perceptrons , 1988, IEEE 1988 International Conference on Neural Networks.

[9]  G. Wahba Practical Approximate Solutions to Linear Operator Equations When the Data are Noisy , 1977 .

[10]  G. Wahba,et al.  Generalized Inverses in Reproducing Kernel Spaces: An Approach to Regularization of Linear Operator Equations , 1974 .

[11]  C. W. Groetsch,et al.  The theory of Tikhonov regularization for Fredholm equations of the first kind , 1984 .

[12]  M. A. Lukas Convergence rates for regularized solutions , 1988 .

[13]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[14]  J. B. Diaz,et al.  On iteration procedures for equations of the first kind, $Ax=y$, and Picard’s criterion for the existence of a solution , 1970 .