Stochastic Learning

This contribution presents an overview of the theoretical and practical aspects of the broad family of learning algorithms based on Stochastic Gradient Descent, including Perceptrons, Adalines, K-Means, LVQ, Multi-Layer Networks, and Graph Transformer Networks.

[1]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  I︠a︡. Z. T︠S︡ypkin,et al.  Foundations of the theory of learning systems , 1973 .

[5]  Kumpati S. Narendra,et al.  Adaptation and learning in automatic systems , 1974 .

[6]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[7]  Lennart Ljung,et al.  Theory and Practice of Recursive Identification , 1983 .

[8]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[9]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[10]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[11]  S. Thomas Alexander,et al.  Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.

[12]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[13]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[14]  Teuvo Kohonen,et al.  Statistical pattern recognition with neural networks , 1988, Neural Networks.

[15]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[16]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[17]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[18]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[19]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[20]  Isabelle Guyon,et al.  Recognition-Based Segmentation of On-Line Hand-Printed Words , 1992, NIPS.

[21]  John C. Platt,et al.  Postal Address Block Location Using a Convolutional Locator Network , 1993, NIPS.

[22]  G. Orr,et al.  Momentum and optimal stochastic search , 1993 .

[23]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[24]  Anton Gunzinger,et al.  Fast neural net simulation with a DSP processor array , 1995, IEEE Trans. Neural Networks.

[25]  Yoshua Bengio,et al.  LeRec: A NN/HMM Hybrid for On-Line Handwriting Recognition , 1995, Neural Computation.

[26]  Shun-ichi Amari,et al.  Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient , 1996, NIPS.

[27]  Yoshua Bengio,et al.  Global training of document processing systems using graph transformer networks , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[29]  Claudio Gentile,et al.  Linear Hinge Loss and Average Margin , 1998, NIPS.

[30]  Shun-ichi Amari,et al.  Statistical analysis of learning dynamics , 1999, Signal Process..

[31]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[32]  Nicol N. Schraudolph,et al.  Conjugate Directions for Stochastic Gradient Descent , 2002, ICANN.

[33]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[34]  Ji Zhu,et al.  Margin Maximizing Loss Functions , 2003, NIPS.

[35]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[36]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[37]  Léon Bottou,et al.  On-line learning for very large data sets , 2005 .