An analysis of noise in recurrent neural networks: convergence and generalization

Concerns the effect of noise on the performance of feedforward neural nets. We introduce and analyze various methods of injecting synaptic noise into dynamically driven recurrent nets during training. Theoretical results show that applying a controlled amount of noise during training may improve convergence and generalization performance. We analyze the effects of various noise parameters and predict that best overall performance can be achieved by injecting additive noise at each time step. Noise contributes a second-order gradient term to the error function which can be viewed as an anticipatory agent to aid convergence. This term appears to find promising regions of weight space in the beginning stages of training when the training error is large and should improve convergence on error surfaces with local minima. The first-order term is a regularization term that can improve generalization. Specifically, it can encourage internal representations where the state nodes operate in the saturated regions of the sigmoid discriminant function. While this effect can improve performance on automata inference problems with binary inputs and target outputs, it is unclear what effect it will have on other types of problems. To substantiate these predictions, we present simulations on learning the dual parity grammar from temporal strings for all noise models, and present simulations on learning a randomly generated six-state grammar using the predicted best noise model.

[1]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[2]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[3]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[4]  Klaus Schulten,et al.  Influence of noise on the behavior of an autoassociative neural network , 1987 .

[5]  Joachim M. Buhmann,et al.  Noise-driven temporal association in neural networks , 1987 .

[6]  Joachim M. Buhmann,et al.  Storing sequences of biased patterns in neural networks with stochastic dynamics , 1988 .

[7]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[8]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[9]  Stephen José Hanson,et al.  A stochastic version of the delta rule , 1990 .

[10]  C. H. Sequin,et al.  Fault tolerance in artificial neural networks , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[11]  Thomas Kailath,et al.  Model-free distributed learning , 1990, IEEE Trans. Neural Networks.

[12]  Marwan A. Jabri,et al.  Weight perturbation: an optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks , 1992, IEEE Trans. Neural Networks.

[13]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[14]  Hans J. Bremermann,et al.  How the Brain Adjusts Synapses - Maybe , 1991, Automated Reasoning: Essays in Honor of Woody Bledsoe.

[15]  Robert S. Boyer,et al.  Automated Reasoning: Essays in Honor of Woody Bledsoe , 1991, Automated Reasoning.

[16]  Petri Koistinen,et al.  Using additive noise in back-propagation training , 1992, IEEE Trans. Neural Networks.

[17]  J. I. Minnix Fault tolerance of the backpropagation neural network trained on noisy inputs , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[18]  Gert Cauwenberghs,et al.  A Fast Stochastic Error-Descent Algorithm for Supervised Learning and Optimization , 1992, NIPS.

[19]  Marwan A. Jabri,et al.  Summed Weight Neuron Perturbation: An O(N) Improvement Over Weight Perturbation , 1992, NIPS.

[20]  Paul W. Munro,et al.  Nets with Unreliable Hidden Nodes Learn Error-Correcting Codes , 1992, NIPS.

[21]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[22]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[23]  Robert M. Burton,et al.  Event-dependent control of noise enhances learning in neural networks , 1992, Neural Networks.

[24]  Padhraic Smyth,et al.  Learning Finite State Machines With Self-Clustering Recurrent Networks , 1993, Neural Computation.

[25]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[26]  C. Lee Giles,et al.  Experimental Comparison of the Effect of Order in Recurrent Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[27]  C. Lee Giles,et al.  Pruning recurrent neural networks for improved generalization performance , 1994, IEEE Trans. Neural Networks.

[28]  Alan F. Murray,et al.  Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training , 1994, IEEE Trans. Neural Networks.

[29]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[30]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[31]  W. Omlin Fault-tolerant Implementation of Finite-state Automata in Recurrent Neural Networks , 1995 .