Kernel regression and backpropagation training with noise

One method proposed for improving the generalization capability of a feedforward network trained with the backpropagation algorithm is to use artificial training vectors which are obtained by adding noise to the original training vectors. The authors discuss the connection of such backpropagation training with noise to kernel density and kernel regression estimation. They compare by simulated examples backpropagation, backpropagation with noise, and kernel regression in mapping estimation and pattern classification contexts. It is concluded that additive noise can improve the generalization capability of a feedforward network trained with the backpropagation approach. The magnitude of the noise cannot be selected blindly, though. Cross-validation-type procedures seem to be well suited for the selection of noise magnitude. Kernel regression, however, seems to perform well whenever backpropagation with noise performs well.<<ETX>>

[1]  Robert P. W. Duin,et al.  On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions , 1976, IEEE Transactions on Computers.

[2]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[3]  E. Nadaraya On Estimating Regression , 1964 .

[4]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[5]  J. Friedman,et al.  FLEXIBLE PARSIMONIOUS SMOOTHING AND ADDITIVE MODELING , 1989 .

[6]  Petri Koistinen,et al.  Using additive noise in back-propagation training , 1992, IEEE Trans. Neural Networks.

[7]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[8]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[9]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[10]  L. Devroye,et al.  Distribution-Free Consistency Results in Nonparametric Discrimination and Regression Function Estimation , 1980 .

[11]  John A. Hertz,et al.  Exploiting Neurons with Localized Receptive Fields to Learn Chaos , 1990, Complex Syst..

[12]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[13]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[14]  W. Härdle,et al.  Optimal Bandwidth Selection in Nonparametric Regression Function Estimation , 1985 .

[15]  J. Habbema A stepwise discriminant analysis program using density estimetion , 1974 .

[16]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.