Deep Boltzmann Machines

We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables. Data-dependent expectations are estimated using a variational approximation that tends to focus on a single mode, and dataindependent expectations are approximated using persistent Markov chains. The use of two quite different techniques for estimating the two types of expectation that enter into the gradient of the log-likelihood makes it practical to learn Boltzmann machines with multiple hidden layers and millions of parameters. The learning can be made more efficient by using a layer-by-layer “pre-training” phase that allows variational inference to be initialized with a single bottomup pass. We present results on the MNIST and NORB datasets showing that deep Boltzmann machines learn good generative models and perform well on handwritten digit and visual object recognition tasks.

[1]  Tamio Shimizu,et al.  A Stochastic Approximation Method for Optimization Problems , 1969, Journal of the ACM.

[2]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[3]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[4]  L. Younes Parametric Inference for imperfectly observed Gibbsian fields , 1989 .

[5]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[6]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[7]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[8]  L. Younes On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .

[9]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[10]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[11]  Geoffrey E. Hinton,et al.  A New Learning Algorithm for Mean Field Boltzmann Machines , 2002, ICANN.

[12]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[13]  Alan L. Yuille,et al.  The Convergence of Contrastive Divergences , 2004, NIPS.

[14]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[15]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[16]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[17]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[18]  H. Robbins A Stochastic Approximation Method , 1951 .

[19]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[20]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[21]  Geoffrey E. Hinton,et al.  Implicit Mixtures of Restricted Boltzmann Machines , 2008, NIPS.

[22]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[23]  R. Salakhutdinov Learning and Evaluating Boltzmann Machines , 2008 .