Influence of noises added to hidden units on learning of multilayer perceptrons and structurization of networks

This paper investigates the influence of noises added to hidden units of multilayer perceptrons. It is shown that a skeletal structure of the network emerges when independent Gaussian noises are added to inputs of hidden units during the error back-propagation learning. By analyzing the average behavior of the backp-ropagation learning to such noises, it is shown that the weights from hidden units to output units tend to be small and outputs of hidden units tend to be 0 or 1. This means that the network is automatically structurized by adding the noises. As the result, it is expected that the generalization ability of the network is improved. This network structurization was confirmed by experiments of pattern classification and logic Boolean function learning.

[1]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[2]  M. Kawato,et al.  Estimation of generalization capability by combination of new information criterion and cross validation , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[3]  Alan F. Murray Analogue noise-enhanced learning in neural network circuits , 1991 .

[4]  N. Otsu,et al.  Nonlinear data analysis and multilayer perceptrons , 1989, International 1989 Joint Conference on Neural Networks.

[5]  Masumi Ishikawa A Structural Connectionist Learning Algorithm with Forgetting , 1990 .

[6]  Alan F. Murray Multilayer Perceptron Learning Optimized for On-Chip Implementation: A Noise-Robust System , 1992, Neural Computation.

[7]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[8]  Alan F. Murray,et al.  Synaptic weight noise during multilayer perceptron training: fault tolerance and training improvements , 1993, IEEE Trans. Neural Networks.

[9]  H. Asoh,et al.  An Approximation of Nonlinear Canonical Correlation Analysis by Multilayer Perceptrons , 1994 .

[10]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[11]  David Lowe,et al.  The optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis , 1990, Neural Networks.

[12]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[13]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[14]  Andrew R. Webb,et al.  Functional approximation by feed-forward networks: a least-squares approach to generalization , 1994, IEEE Trans. Neural Networks.

[15]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[16]  Shotaro Akaho,et al.  Regularization Learning of Neural Networks for Generalization , 1992, ALT.

[17]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[18]  Takio Kurita,et al.  Iterative weighted least squares algorithms for neural networks classifiers , 1992, New Generation Computing.

[19]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[20]  R.J.F. Dow,et al.  Neural net pruning-why and how , 1988, IEEE 1988 International Conference on Neural Networks.

[21]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[22]  Hideki Asoh,et al.  An approximation of nonlinear discriminant analysis by multilayer neural networks , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[23]  Takio Kurita,et al.  A structural learning by adding independent noises to hidden units , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[24]  David G. Lowe,et al.  Optimized Feature Extraction and the Bayes Decision in Feed-Forward Classifier Networks , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Takio Kurita,et al.  A Method to Reduce Redundant Hidden Nodes (Special Issue on Neurocomputing) , 1994 .

[26]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[27]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .