On Tracking The Partition Function

Markov Random Fields (MRFs) have proven very powerful both as density estimators and feature extractors for classification. However, their use is often limited by an inability to estimate the partition function Z. In this paper, we exploit the gradient descent training procedure of restricted Boltzmann machines (a type of MRF) to track the log partition function during learning. Our method relies on two distinct sources of information: (1) estimating the change AZ incurred by each gradient update, (2) estimating the difference in Z over a small set of tempered distributions using bridge sampling. The two sources of information are then combined using an inference procedure similar to Kalman filtering. Learning MRFs through Tempered Stochastic Maximum Likelihood, we can estimate Z using no more temperatures than are required for learning. Comparing to both exact values and estimates using annealed importance sampling (AIS), we show on several datasets that our method is able to accurately track the log partition function. In contrast to AIS, our method provides this estimate at each time-step, at a computational cost similar to that required for training alone.

[1]  Charles H. Bennett,et al.  Efficient estimation of free energy differences from Monte Carlo data , 1976 .

[2]  G. Parisi,et al.  Simulated tempering: a new Monte Carlo scheme , 1992, hep-lat/9205018.

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[5]  Yukito Iba EXTENDED ENSEMBLE MONTE CARLO , 2001 .

[6]  Zoubin Ghahramani,et al.  Bayesian Learning in Undirected Graphical Models , 2003 .

[7]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[8]  Zoubin Ghahramani,et al.  Bayesian Learning in Undirected Graphical Models: Approximate MCMC Algorithms , 2004, UAI.

[9]  Radford M. Neal Estimating Ratios of Normalizing Constants Using Linked Importance Sampling , 2005, math/0511216.

[10]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[11]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[12]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[13]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[14]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[15]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[16]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[17]  P. Tavan,et al.  Efficiency of exchange schemes in replica exchange , 2009 .

[18]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[19]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[20]  Ruslan Salakhutdinov,et al.  Learning in Markov Random Fields using Tempered Transitions , 2009, NIPS.

[21]  Yoshua Bengio,et al.  Adaptive Parallel Tempering for Stochastic Maximum Likelihood Learning of RBMs , 2010, ArXiv.

[22]  Yoshua Bengio,et al.  Tractable Multivariate Binary Density Estimation and the Restricted Boltzmann Forest , 2010, Neural Computation.

[23]  Ruslan Salakhutdinov,et al.  Learning Deep Boltzmann Machines using Adaptive MCMC , 2010, ICML.

[24]  Pascal Vincent,et al.  Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines , 2010, AISTATS.

[25]  Nando de Freitas,et al.  Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[26]  Tapani Raiko,et al.  Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines , 2011, ICML.

[27]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[28]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.