K-nearest Neighbors Directed Noise Injection in Multilayer Perceptron Training

The relation between classifier complexity and learning set size is very important in discriminant analysis. One of the ways to overcome the complexity control problem is to add noise to the training objects, increasing in this way the size of the training set. Both the amount and the directions of noise injection are important factors which determine the effectiveness for classifier training. In this paper the effect is studied of the injection of Gaussian spherical noise and -nearest neighbors directed noise on the performance of multilayer perceptrons. As it is impossible to provide an analytical investigation for multilayer perceptrons, a theoretical analysis is made for statistical classifiers. The goal is to get a better understanding of the effect of noise injection on the accuracy of sample-based classifiers. By both empirical as well as theoretical studies, it is shown that the -nearest neighbors directed noise injection is preferable over the Gaussian spherical noise injection for data with low intrinsic dimensionality.

[1]  T. Snijders Multivariate Statistics and Matrices in Statistics , 1995 .

[2]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[3]  B. Chandrasekaran,et al.  On dimensionality and sample size in statistical pattern classification , 1971, Pattern Recognit..

[4]  Robert P. W. Duin,et al.  On the accuracy of statistical pattern recognizers , 1978 .

[5]  Šarūnas Raudys On the effectiveness of Parzen window classifier , 1991 .

[6]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[7]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[8]  Sariinas Raudys,et al.  ON SHAPE OF PATTERN ERROR FUNCTION, INITIALIZATIONS AND INTRINSIC DIMENSIONALITY IN ANN CLASSIFIER DESIGN , 1993 .

[9]  Kęstutis Lašinskas,et al.  The overlappingly decomposed networks , 1992 .

[10]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[11]  Kiyotoshi Matsuoka,et al.  Noise injection into inputs in back-propagation learning , 1992, IEEE Trans. Syst. Man Cybern..

[12]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[13]  Todd K. Leen,et al.  From Data Distributions to Regularization in Invariant Learning , 1995, Neural Computation.

[14]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[15]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[16]  L. Ljung,et al.  Overtraining, Regularization, and Searching for Minimum in Neural Networks , 1992 .

[17]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[18]  Hidetsugu Sakaguchi Stochastic Dynamics and Learning Rules in Layered Neural Networks , 1990 .

[19]  Robert P. W. Duin,et al.  Bagging for linear classifiers , 1998, Pattern Recognit..

[20]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[21]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[22]  J. Friedman Regularized Discriminant Analysis , 1989 .

[23]  Yves Grandvalet,et al.  Noise Injection: Theoretical Prospects , 1997, Neural Computation.

[24]  Anil K. Jain,et al.  Small sample size effects in statistical pattern recognition: recommendations for practitioners and open problems , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[25]  Andrzej Cichocki,et al.  Neural networks for optimization and signal processing , 1993 .

[26]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  K JainAnil,et al.  Small Sample Size Effects in Statistical Pattern Recognition , 1991 .

[28]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..