Learning Deep Boltzmann Machines using Adaptive MCMC

When modeling high-dimensional richly structured data, it is often the case that the distribution defined by the Deep Boltzmann Machine (DBM) has a rough energy landscape with many local minima separated by high energy barriers. The commonly used Gibbs sampler tends to get trapped in one local mode, which often results in unstable learning dynamics and leads to poor parameter estimates. In this paper, we concentrate on learning DBM's using adaptive MCMC algorithms. We first show a close connection between Fast PCD and adaptive MCMC. We then develop a Coupled Adaptive Simulated Tempering algorithm that can be used to better explore a highly multimodal energy landscape. Finally, we demonstrate that the proposed algorithm considerably improves parameter estimates, particularly when learning large-scale DBM's.

[1]  F. Liang Determination of normalizing constants for simulated tempering , 2005 .

[2]  Alan L. Yuille,et al.  The Convergence of Contrastive Divergences , 2004, NIPS.

[3]  Ruslan Salakhutdinov,et al.  Learning in Markov Random Fields using Tempered Transitions , 2009, NIPS.

[4]  G. Parisi,et al.  Simulated tempering: a new Monte Carlo scheme , 1992, hep-lat/9205018.

[5]  Geoffrey E. Hinton,et al.  Wormholes Improve Contrastive Divergence , 2003, NIPS.

[6]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[7]  L. Younes On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .

[8]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[9]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[10]  L. Younes Estimation and annealing for Gibbsian fields , 1988 .

[11]  D. Landau,et al.  Efficient, multiple-range random walk algorithm to calculate the density of states. , 2000, Physical review letters.

[12]  Radford M. Neal Sampling from multimodal distributions using tempered transitions , 1996, Stat. Comput..

[13]  R. Salakhutdinov Learning and Evaluating Boltzmann Machines , 2008 .

[14]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[15]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[16]  Pascal Vincent,et al.  Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines , 2010, AISTATS.

[17]  Geoffrey E. Hinton,et al.  Implicit Mixtures of Restricted Boltzmann Machines , 2008, NIPS.

[18]  Jun S. Liu,et al.  The Wang-Landau Algorithm for Monte Carlo computation in general state spaces , 2005 .

[19]  H. Robbins A Stochastic Approximation Method , 1951 .