Accurate and conservative estimates of MRF log-likelihood using reverse annealing

Markov random fields (MRFs) are difficult to evaluate as generative models because computing the test log-probabilities requires the intractable partition function. Annealed importance sampling (AIS) is widely used to estimate MRF partition functions, and often yields quite accurate results. However, AIS is prone to overestimate the log-likelihood with little indication that anything is wrong. We present the Reverse AIS Estimator (RAISE), a stochastic lower bound on the log-likelihood of an approximation to the original MRF model. RAISE requires only the same MCMC transition operators as standard AIS. Experimental results indicate that RAISE agrees closely with AIS log-probability estimates for RBMs, DBMs, and DBNs, but typically errs on the side of underestimating, rather than overestimating, the log-likelihood.

[1]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[2]  Joshua B. Tenenbaum,et al.  One-shot learning by inverting a compositional causal process , 2013, NIPS.

[3]  Li Yao,et al.  Bounding the Test Log-Likelihood of Generative Models , 2014, ICLR.

[4]  C. Jarzynski Equilibrium free-energy differences from nonequilibrium measurements: A master-equation approach , 1997, cond-mat/9707325.

[5]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[6]  Nial Friel,et al.  Tuning tempered transitions , 2010, Stat. Comput..

[7]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[8]  Radford M. Neal Sampling from multimodal distributions using tempered transitions , 1996, Stat. Comput..

[9]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[12]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[13]  Matthias Bethge,et al.  In All Likelihood, Deep Belief Is Not Enough , 2010, J. Mach. Learn. Res..

[14]  Mark A. Girolami,et al.  Estimating Bayes factors via thermodynamic integration and population MCMC , 2009, Comput. Stat. Data Anal..

[15]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[16]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[17]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[18]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[19]  Vibhav Gogate,et al.  Studies in Lower Bounding Probabilities of Evidence using the Markov Inequality , 2007, UAI.

[20]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[21]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[22]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[23]  Ruslan Salakhutdinov,et al.  Annealing between distributions by averaging moments , 2013, NIPS.

[24]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.