Algorithms for Estimating the Partition Function of Restricted Boltzmann Machines (Extended Abstract)

Estimating the normalization constants (partition functions) of energy-based probabilistic models (Markov random fields) with a high accuracy is required for measuring performance, monitoring the training progress of adaptive models, and conducting likelihood ratio tests. We devised a unifying theoretical framework for algorithms for estimating the partition function, including Annealed Importance Sampling (AIS) and Bennett’s Acceptance Ratio method (BAR). The unification reveals conceptual similarities of and differences between different approaches and suggests new algorithms. The framework is based on a generalized form of Crooks’ equality, which links the expectation over a distribution of samples generated by a transition operator to the expectation over the distribution induced by the reversed operator. Different ways of sampling, such as parallel tempering and path sampling, are covered by the framework. We performed experiments in which we estimated the partition function of restricted Boltzmann machines (RBMs) and Ising models. We found that BAR using parallel tempering worked well with a small number of bridging distributions, while path sampling based AIS performed best with many bridging distributions. The normalization constant is measured w.r.t. a reference distribution, and the choice of this distribution turned out to be very important in our experiments. Overall, BAR gave the best empirical results, outperforming AIS.

[1]  Michael R. Shirts,et al.  Equilibrium free energies from nonequilibrium measurements using maximum-likelihood methods. , 2003, Physical review letters.

[2]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[3]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[4]  Frederick Eberhardt,et al.  Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI) , 2017 .

[5]  Pascal Vincent,et al.  Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines , 2010, AISTATS.

[6]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[7]  Charles H. Bennett,et al.  Efficient estimation of free energy differences from Monte Carlo data , 1976 .

[8]  Andreas C. Müller,et al.  Investigating Convergence of Restricted Boltzmann Machine Learning , 2010 .

[9]  John W. Fisher,et al.  Estimating the Partition Function by Discriminance Sampling , 2015, UAI.

[10]  October I Physical Review Letters , 2022 .

[11]  C. Geyer Estimating Normalizing Constants and Reweighting Mixtures , 1994 .

[12]  J. van Leeuwen,et al.  Theoretical Computer Science , 2003, Lecture Notes in Computer Science.

[13]  Ruslan Salakhutdinov,et al.  On the Quantitative Analysis of Decoder-Based Generative Models , 2016, ICLR.

[14]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[15]  G. Crooks Path-ensemble averages in systems driven far from equilibrium , 1999, cond-mat/9908420.

[16]  Radford M. Neal Estimating Ratios of Normalizing Constants Using Linked Importance Sampling , 2005, math/0511216.

[17]  Oswin Krause,et al.  Algorithms for estimating the partition function of restricted Boltzmann machines , 2020, Artif. Intell..

[18]  Christian Igel,et al.  Training restricted Boltzmann machines: An introduction , 2014, Pattern Recognit..

[19]  John Eccleston,et al.  Statistics and Computing , 2006 .

[20]  Pushmeet Kohli,et al.  Markov Random Fields for Vision and Image Processing , 2011 .

[21]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .