Stochastic Descent Analysis of Representation Learning Algorithms

Although stochastic approximation learning methods have been widely used in the machine learning literature for over 50 years, formal theoretical analyses of specific machine learning algorithms are less common because stochastic approximation theorems typically possess assumptions which are difficult to communicate and verify. This paper presents a new stochastic approximation theorem for state-dependent noise with easily verifiable assumptions applicable to the analysis and design of important deep learning algorithms including: adaptive learning, contrastive divergence learning, stochastic descent expectation maximization, and active learning.

[1]  Nando de Freitas,et al.  A tutorial on stochastic approximation algorithms for training Restricted Boltzmann Machines and Deep Belief Nets , 2010, 2010 Information Theory and Applications Workshop (ITA).

[2]  L. Younes On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .

[3]  Eric M. Dowling,et al.  Multiuser interference suppression using block Shanno constant modulus algorithm , 2000, IEEE Trans. Signal Process..

[4]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[5]  F. Kong,et al.  A stochastic approximation algorithm with Markov chain Monte-carlo method for incomplete data estimation problems. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[7]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[8]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[9]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[10]  É. Moulines,et al.  Convergence of a stochastic approximation version of the EM algorithm , 1999 .

[11]  Richard M. Golden,et al.  Mathematical Methods for Neural Network Analysis and Design , 1996 .

[12]  Geoffrey E. Hinton,et al.  An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[13]  R. Jennrich Asymptotic Properties of Non-Linear Least Squares Estimators , 1969 .

[14]  Murat Torlak,et al.  Blind adaptive CDMA processing for smart antennas using the block shanno constant modulus algorithm , 2006, IEEE Transactions on Signal Processing.

[15]  H. Robbins,et al.  A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .

[16]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[17]  BroadieMark,et al.  Multidimensional stochastic approximation , 2014 .

[18]  John Moody,et al.  Learning rate schedules for faster stochastic gradient search , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[19]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Maria-Florina Balcan,et al.  Statistical Active Learning Algorithms , 2013, NIPS.

[23]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[24]  W. Grassman Approximation and Weak Convergence Methods for Random Processes with Applications to Stochastic Systems Theory (Harold J. Kushner) , 1986 .

[25]  Charles E. McCulloch,et al.  The EM Algorithm and Its Extensions , 1998 .