Partition Functions from Rao-Blackwellized Tempered Sampling

Partition functions of probability distributions are important quantities for model evaluation and comparisons. We present a new method to compute partition functions of complex and multimodal distributions. Such distributions are often sampled using simulated tempering, which augments the target space with an auxiliary inverse temperature variable. Our method exploits the multinomial probability law of the inverse temperatures, and provides estimates of the partition function in terms of a simple quotient of Rao-Blackwellized marginal inverse temperature probability estimates, which are updated while sampling. We show that the method has interesting connections with several alternative popular methods, and offers some significant advantages. In particular, we empirically find that the new method provides more accurate estimates than Annealed Importance Sampling when calculating partition functions of large Restricted Boltzmann Machines (RBM); moreover, the method is sufficiently accurate to track training and validation log-likelihoods during learning of RBMs, at minimal computational cost.

[1]  Charles H. Bennett,et al.  Efficient estimation of free energy differences from Monte Carlo data , 1976 .

[2]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[3]  G. Parisi,et al.  Simulated tempering: a new Monte Carlo scheme , 1992, hep-lat/9205018.

[4]  C. Geyer Estimating Normalizing Constants and Reweighting Mixtures , 1994 .

[5]  C. Geyer,et al.  Annealing Markov chain Monte Carlo with applications to ancestral inference , 1995 .

[6]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .

[7]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[8]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[9]  D. Landau,et al.  Efficient, multiple-range random walk algorithm to calculate the density of states. , 2000, Physical review letters.

[10]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[11]  A. Gorin,et al.  Accelerated simulated tempering , 2004 .

[12]  Radford M. Neal Estimating Ratios of Normalizing Constants Using Linked Importance Sampling , 2005, math/0511216.

[13]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[14]  Mark A. Girolami,et al.  Bayesian ranking of biochemical system models , 2008, Bioinform..

[15]  Mark A. Girolami,et al.  Bayesian ranking of biochemical system models , 2008, Bioinform..

[16]  Michael R. Shirts,et al.  Statistically optimal analysis of samples from multiple equilibrium states. , 2008, The Journal of chemical physics.

[17]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[18]  Mark A. Girolami,et al.  Estimating Bayes factors via thermodynamic integration and population MCMC , 2009, Comput. Stat. Data Anal..

[19]  Ruslan Salakhutdinov,et al.  Learning in Markov Random Fields using Tempered Transitions , 2009, NIPS.

[20]  J. M. Sanz-Serna,et al.  Optimal tuning of the hybrid Monte Carlo algorithm , 2010, 1001.4460.

[21]  Ruslan Salakhutdinov,et al.  Learning Deep Boltzmann Machines using Adaptive MCMC , 2010, ICML.

[22]  Pascal Vincent,et al.  Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines , 2010, AISTATS.

[23]  Jun S. Liu,et al.  The Wang-Landau algorithm in general state spaces: Applications and convergence analysis , 2010 .

[24]  C. Robert,et al.  Importance sampling methods for Bayesian discrimination between embedded models , 2009, 0910.2325.

[25]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[26]  Yoshua Bengio,et al.  On Tracking The Partition Function , 2011, NIPS.

[27]  Jascha Sohl-Dickstein,et al.  Minimum Probability Flow Learning , 2009, ICML.

[28]  Nial Friel,et al.  Estimating the evidence – a review , 2011, 1111.1957.

[29]  P. Dellaportas,et al.  Control variates for estimation based on reversible Markov chain Monte Carlo samplers , 2012 .

[30]  Z. Tan,et al.  Theory of binless multi-state free energy estimation with applications to protein-ligand binding. , 2012, The Journal of chemical physics.

[31]  Ruslan Salakhutdinov,et al.  Annealing between distributions by averaging moments , 2013, NIPS.

[32]  Jan-Willem van de Meent,et al.  Tempering by Subsampling , 2014, 1401.7145.

[33]  Nial Friel,et al.  Improving power posterior estimation of statistical evidence , 2012, Stat. Comput..

[34]  Volkan Cevher,et al.  Stochastic Spectral Descent for Restricted Boltzmann Machines , 2015, AISTATS.

[35]  Florent Krzakala,et al.  Training Restricted Boltzmann Machines via the Thouless-Anderson-Palmer Free Energy , 2015, NIPS 2015.

[36]  John W. Fisher,et al.  Estimating the Partition Function by Discriminance Sampling , 2015, UAI.

[37]  Ruslan Salakhutdinov,et al.  Scaling up Natural Gradient by Sparsely Factorizing the Inverse Fisher Matrix , 2015, ICML.

[38]  Ruslan Salakhutdinov,et al.  Accurate and conservative estimates of MRF log-likelihood using reverse annealing , 2014, AISTATS.

[39]  Daniel Jiwoong Im,et al.  Understanding Minimum Probability Flow for RBMs Under Various Kinds of Dynamics , 2015, ICLR.

[40]  Ryan P. Adams,et al.  Sandwiching the marginal likelihood using bidirectional Monte Carlo , 2015, ArXiv.

[41]  Volkan Cevher,et al.  Preconditioned Spectral Descent for Deep Learning , 2015, NIPS.

[42]  Volkan Cevher,et al.  Stochastic Spectral Descent for Discrete Graphical Models , 2016, IEEE Journal of Selected Topics in Signal Processing.

[43]  Mark Girolami,et al.  The Controlled Thermodynamic Integral for Bayesian Model Evidence Evaluation , 2016 .

[44]  Ole Winther,et al.  Bayesian Generalised Ensemble Markov Chain Monte Carlo , 2016, AISTATS.

[45]  Bin W. Zhang,et al.  Locally weighted histogram analysis and stochastic solution for large-scale multi-state free energy estimation. , 2016, The Journal of chemical physics.

[46]  Z. Tan Optimally Adjusted Mixture Sampling and Locally Weighted Histogram Analysis , 2017 .