Convergence of learning algorithms with constant learning rates

The behavior of neural network learning algorithms with a small, constant learning rate, epsilon, in stationary, random input environments is investigated. It is rigorously established that the sequence of weight estimates can be approximated by a certain ordinary differential equation, in the sense of weak convergence of random processes as epsilon tends to zero. As applications, backpropagation in feedforward architectures and some feature extraction algorithms are studied in more detail.

[1]  Harold J. Kushner,et al.  Approximation and Weak Convergence Methods for Random Processes , 1984 .

[2]  Harold J. Kushner,et al.  Weak Convergence and Properties of Adaptive Asymptotic F ilters w ith 177 Constant Gains , 1998 .

[3]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[4]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[7]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[8]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[9]  J. Rubner,et al.  A Self-Organizing Network for Principal-Component Analysis , 1989 .

[10]  Pierre Baldi,et al.  Linear Learning: Landscapes and Algorithms , 1988, NIPS.

[11]  P. Foldiak,et al.  Adaptive network for optimal linear feature extraction , 1989, International 1989 Joint Conference on Neural Networks.

[12]  Halbert White,et al.  Recursive M-estimation, nonlinear regression and neural network learning with dependent observations , 1990 .

[13]  H. White Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models , 1989 .

[14]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[15]  Kurt Hornik,et al.  Convergence analysis of local feature extraction algorithms , 1992, Neural Networks.

[16]  J. Doob Stochastic processes , 1953 .