Using additive noise in back-propagation training

The possibility of improving the generalization capability of a neural network by introducing additive noise to the training samples is discussed. The network considered is a feedforward layered neural network trained with the back-propagation algorithm. Back-propagation training is viewed as nonlinear least-squares regression and the additive noise is interpreted as generating a kernel estimate of the probability density that describes the training vector distribution. Two specific application types are considered: pattern classifier networks and estimation of a nonstochastic mapping from data corrupted by measurement errors. It is not proved that the introduction of additive noise to the training vectors always improves network generalization. However, the analysis suggests mathematically justified rules for choosing the characteristics of noise if additive noise is used in training. Results of mathematical statistics are used to establish various asymptotic consistency results for the proposed method. Numerical simulations support the applicability of the training method.

[1]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[2]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[3]  T. Cacoullos Estimation of a multivariate density , 1966 .

[4]  R. Jennrich Asymptotic Properties of Non-Linear Least Squares Estimators , 1969 .

[5]  Robert P. W. Duin,et al.  On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions , 1976, IEEE Transactions on Computers.

[6]  J. V. Ness,et al.  On the Effects of Dimension in Discriminant Analysis , 1976 .

[7]  J. Nash Compact Numerical Methods for Computers , 2018 .

[8]  J. V. Ness On the Effects of Dimension in Discriminant Analysis for Unequal Covariance Populations , 1979 .

[9]  John Van Ness,et al.  On the dominance of non-parametric Bayes rule discriminant algorithms in high dimensions , 1980, Pattern Recognit..

[10]  L. Devroye The Equivalence of Weak, Strong and Complete Convergence in $L_1$ for Kernel Density Estimates , 1983 .

[11]  S. Geman,et al.  Consistent Cross-Validated Density Estimation , 1983 .

[12]  Luc Devroye,et al.  Nonparametric Density Estimation , 1985 .

[13]  J. Marron An Asymptotically Efficient Solution to the Bandwidth Problem of Kernel Density Estimation , 1985 .

[14]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[15]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[16]  L. Devroye A Course in Density Estimation , 1987 .

[17]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[18]  T. Kohonen,et al.  Statistical pattern recognition with neural networks: benchmarking studies , 1988, IEEE 1988 International Conference on Neural Networks.

[19]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[20]  A. Linden,et al.  Inversion of multilayer nets , 1989, International 1989 Joint Conference on Neural Networks.

[21]  M. M. Moya,et al.  One-class generalization in second-order backpropagation networks for image classification , 1989 .

[22]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[23]  P. Réfrégier,et al.  An Improved Version of the Pseudo-Inverse Solution for Classification and Neural Networks , 1989 .

[24]  D. J. Wallace,et al.  Training with noise and the storage of correlated patterns in a neural network model , 1989 .

[25]  Matsuoka,et al.  Syllable recognition using integrated neural networks , 1989 .

[26]  Györgyi,et al.  Inference of a rule by a neural network with thermal noise. , 1990, Physical review letters.

[27]  Bernhard R. Kämmerer,et al.  Experiments for isolated-word recognition with single- and two-layer perceptrons , 1990, Neural Networks.

[28]  K Y M Wong,et al.  Training noise adaptation in attractor neural networks , 1990 .

[29]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[30]  Petri Koistinen,et al.  Kernel regression and backpropagation training with noise , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[31]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[32]  A. Krogh Learning with noise in a linear perceptron , 1992 .

[33]  L. Holmström,et al.  Asymptotic bounds for the expected L 1 error of a multivariate kernel density estimator , 1992 .

[34]  J. Hertz,et al.  Generalization in a linear perceptron in the presence of noise , 1992 .