On Langevin Updating in Multilayer Perceptrons

The Langevin updating rule, in which noise is added to the weights during learning, is presented and shown to improve learning on problems with initially ill-conditioned Hessians. This is particularly important for multilayer perceptrons with many hidden layers, that often have ill-conditioned Hessians. In addition, Manhattan updating is shown to have a similar effect.

[1]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[2]  H. Kushner Asymptotic global behavior for stochastic approximation and diffusions with slowly decreasing noise effects: Global minimization via Monte Carlo , 1987 .

[3]  B. Söderberg On the complex Langevin equation , 1988 .

[4]  Carsten Peterson,et al.  Explorations of the mean field theory learning algorithm , 1989, Neural Networks.

[5]  Stephen José Hanson,et al.  A stochastic version of the delta rule , 1990 .

[6]  D. J. Wallace,et al.  Enlarging the attractor basins of neural networks with noisy external fields , 1991 .

[7]  N. E. Cotter,et al.  A diffusion process for global optimization in neural networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[8]  James D. Keeler,et al.  Predicting the Future: Advantages of Semilocal Units , 1991, Neural Computation.

[9]  Farid U. Dowla,et al.  Backpropagation Learning for Multilayer Feed-Forward Neural Networks Using the Conjugate Gradient Method , 1991, Int. J. Neural Syst..

[10]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[11]  Petri Koistinen,et al.  Using additive noise in back-propagation training , 1992, IEEE Trans. Neural Networks.

[12]  Thorsteinn S. Rögnvaldsson,et al.  Mass reconstruction with a neural network , 1992 .

[13]  Alan F. Murray Multilayer Perceptron Learning Optimized for On-Chip Implementation: A Noise-Robust System , 1992, Neural Computation.

[14]  John E. Moody,et al.  Weight Space Probability Densities in Stochastic Learning: I. Dynamics and Equilibria , 1992, NIPS.

[15]  Alan F. Murray,et al.  Synaptic Weight Noise During MLP Learning Enhances Fault-Tolerance, Generalization and Learning Trajectory , 1992, NIPS.

[16]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[17]  Todd K. Leen,et al.  Weight Space Probability Densities in Stochastic Learning: II. Transients and Basin Hopping Times , 1992, NIPS.

[18]  J. Hertz,et al.  Generalization in a linear perceptron in the presence of noise , 1992 .

[19]  Thorsteinn S. Rögnvaldsson Pattern Discrimination Using Feedforward Networks: A Benchmark Study of Scaling Behavior , 1993, Neural Computation.

[20]  Heskes,et al.  Cooling schedules for learning in neural networks. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[21]  Wolfram Schiffmann,et al.  Comparison of optimized backpropagation algorithms , 1993, ESANN.

[22]  George Cybenko,et al.  Ill-Conditioning in Neural Network Training Problems , 1993, SIAM J. Sci. Comput..

[23]  Thorsteinn S. Rögnvaldsson,et al.  JETNET 3.0—A versatile artificial neural network package , 1994 .

[24]  Carsten Peterson,et al.  Predicting System loads with Artificial Neural Networks : Method and Result from "the Great Energy Predictor Shootout" , 1994 .

[25]  David Saad,et al.  Learning and Generalization in Radial Basis Function Networks , 1995, Neural Computation.

[26]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[27]  O. Pahlm,et al.  Artificial neural networks for recognition of electrocardiographic lead reversal. , 1995, The American journal of cardiology.