Accelerated learning for Restricted Boltzmann Machine with momentum term

Restricted Boltzmann Machines are generative models which can be used as standalone feature extractors, or as a parameter initialization for deeper models. Typically, these models are trained using Contrastive Divergence algorithm, an approximation of the stochastic gradient descent method. In this paper, we aim at speeding up the convergence of the learning procedure by applying the momentum method and the Nesterov’s accelerated gradient technique. We evaluate these two techniques empirically using the image dataset MNIST.

[1]  Geoffrey E. Hinton,et al.  Phone recognition using Restricted Boltzmann Machines , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[3]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[4]  Christian Igel,et al.  An Introduction to Restricted Boltzmann Machines , 2012, CIARP.

[5]  Nando de Freitas,et al.  A tutorial on stochastic approximation algorithms for training Restricted Boltzmann Machines and Deep Belief Nets , 2010, 2010 Information Theory and Applications Workshop (ITA).

[6]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[7]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[8]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[9]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[10]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[11]  John J. Hopfield,et al.  The Effectiveness of Neural Computing , 1989, IFIP Congress.

[12]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[13]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[14]  Richard E. Neapolitan,et al.  Probabilistic reasoning in expert systems - theory and algorithms , 2012 .

[15]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[16]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[17]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[18]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[19]  John J. Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities , 1999 .

[20]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[21]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..