Regularizing the effect of input noise injection in feedforward neural networks training

Injecting input noise during feedforward neural network (NN) training can improve generalization performance markedly. Reported works justify this fact arguing that noise injection is equivalent to a smoothing regularization with the input noise variance playing the role of the regularization parameter. The success of this approach depends on the appropriate choice of the input noise variance. However, it is often not known a priori if the degree of smoothness imposed on the FNN mapping is consistent with the unknown function to be approximated. In order to have a better control over this smoothing effect, a loss function putting in balance the smoothed fitting induced by the noise injection and the precision of approximation, is proposed. The second term, which aims at penalizing the undesirable effect of input noise injection or controlling the deviation of the random perturbed loss, was obtained by expressing a certain distance between the original loss function and its random perturbed version. In fact, this term can be derived in general for parametrical models that satisfy the Lipschitz property. An example is included to illustrate the effectiveness of learning with this proposed loss function when noise injection is used.

[1]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[2]  D. Pollard Convergence of stochastic processes , 1984 .

[3]  Yves Grandvalet Anisotropic noise injection for input variables relevance determination , 2000, IEEE Trans. Neural Networks Learn. Syst..

[4]  Ludger Rüschendorf,et al.  An approximation result for nets in functional estimation , 2001 .

[5]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[6]  Todd K. Leen,et al.  From Data Distributions to Regularization in Invariant Learning , 1995, Neural Computation.

[7]  Guozhong An,et al.  The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.

[8]  Robert J. Marks,et al.  Similarities of error regularization, sigmoid gain scaling, target smoothing, and training with jitter , 1995, IEEE Trans. Neural Networks.

[9]  Ignacio Rojas,et al.  A Quantitative Study of Fault Tolerance, Noise Immunity, and Generalization Ability of MLPs , 2000, Neural Computation.

[10]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[11]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[12]  Gábor Lugosi,et al.  Nonparametric estimation via empirical risk minimization , 1995, IEEE Trans. Inf. Theory.

[13]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[14]  Yves Grandvalet,et al.  Noise Injection: Theoretical Prospects , 1997, Neural Computation.

[15]  András Faragó,et al.  Strong universal consistency of neural network classifiers , 1993, IEEE Trans. Inf. Theory.

[16]  Yves Grandvalet,et al.  Comments on "Noise injection into inputs in back propagation learning" , 1995, IEEE Trans. Syst. Man Cybern..

[17]  Kiyotoshi Matsuoka,et al.  Noise injection into inputs in back-propagation learning , 1992, IEEE Trans. Syst. Man Cybern..

[18]  Zekeriya Uykan,et al.  Analysis of input-output clustering for determining centers of RBFN , 2000, IEEE Trans. Neural Networks Learn. Syst..

[19]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[20]  R. M. Dudley,et al.  Real Analysis and Probability , 1989 .