On-line learning and stochastic approximations

The convergence of online learning algorithms is analyzed using the tools of the stochastic approximation theory, and proved under very weak conditions. A general framework for online learning algorithms is first presented. This framework encompasses the most common online learning algorithms in use today, as illustrated by several examples. The stochastic approximation theory then provides general results describing the convergence of all these learning algorithms at once. Revised version, October 15th 2012.

[1]  E. G. Gladyshev On Stochastic Approximation , 1965 .

[2]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  Kumpati S. Narendra,et al.  Adaptive, learning, and pattern recognition systems: Theory and applications , 1972 .

[5]  I︠a︡. Z. T︠S︡ypkin,et al.  Foundations of the theory of learning systems , 1973 .

[6]  Kumpati S. Narendra,et al.  Adaptation and learning in automatic systems , 1974 .

[7]  V. Nollau Kushner, H. J./Clark, D. S., Stochastic Approximation Methods for Constrained and Unconstrained Systems. (Applied Mathematical Sciences 26). Berlin‐Heidelberg‐New York, Springer‐Verlag 1978. X, 261 S., 4 Abb., DM 26,40. US $ 13.20 , 1980 .

[8]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[9]  Lennart Ljung,et al.  Theory and Practice of Recursive Identification , 1983 .

[10]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[11]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[12]  S. Thomas Alexander,et al.  Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.

[13]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[14]  Teuvo Kohonen,et al.  Statistical pattern recognition with neural networks , 1988, Neural Networks.

[15]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[16]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[17]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[18]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[19]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[20]  Anton Gunzinger,et al.  Fast neural net simulation with a DSP processor array , 1995, IEEE Trans. Neural Networks.

[21]  David Saad,et al.  Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks , 1995, NIPS.

[22]  J. Fort,et al.  On the A.S. Convergence of the Kohonen Algorithm with a General Neighborhood Function , 1995 .

[23]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.