Efficient Evaluation of the Partition Function of RBMs with Annealed Importance Sampling

Probabilistic models based on Restricted Boltzmann Machines (RBMs) imply the evaluation of normalized Boltzmann factors, which in turn require from the evaluation of the partition function Z. The exact evaluation of Z, though, becomes a forbiddingly expensive task as the system size increases. This even worsens when one considers most usual learning algorithms for RBMs, where the exact evaluation of the gradient of the log-likelihood of the empirical distribution of the data includes the computation of Z at each iteration. The Annealed Importance Sampling (AIS) method provides a tool to stochastically estimate the partition function of the system. So far, the standard use of the AIS algorithm in the Machine Learning context has been done using a large number of Monte Carlo steps. In this work we show that this may not be required if a proper starting probability distribution is employed as the initialization of the AIS algorithm. We analyze the performance of AIS in both small- and large-sized problems, and show that in both cases a good estimation of Z can be obtained with little computational cost.

[1]  Yoshua Bengio,et al.  On Tracking The Partition Function , 2011, NIPS.

[2]  Haiping Huang,et al.  Advanced Mean Field Theory of Restricted Boltzmann Machine , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[4]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[5]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[6]  Pascal Vincent,et al.  Parallel Tempering for Training of Restricted Boltzmann Machines , 2010 .

[7]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[8]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[9]  Michael W Deem,et al.  Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[10]  Rocco A. Servedio,et al.  Restricted Boltzmann Machines are Hard to Approximately Evaluate or Simulate , 2010, ICML.

[11]  Daniel J. Amit,et al.  Modeling brain function: the world of attractor neural networks, 1st Edition , 1989 .

[12]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[13]  Geoffrey E. Hinton,et al.  An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[14]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[15]  D. Ceperley Path integrals in the theory of condensed helium , 1995 .

[16]  Oswin Krause,et al.  Algorithms for estimating the partition function of restricted Boltzmann machines , 2020, Artif. Intell..

[17]  Hugo Larochelle,et al.  An Infinite Restricted Boltzmann Machine , 2015, Neural Computation.

[18]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[19]  R. Srinivasan Importance Sampling: Applications in Communications and Detection , 2010 .

[20]  D. Landau,et al.  A new approach to Monte Carlo simulations in statistical physics: Wang-Landau sampling , 2004 .

[21]  K. Schulten,et al.  Introduction to the diffusion Monte Carlo method , 1996, physics/9702023.

[22]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Oswin Krause,et al.  Population-Contrastive-Divergence: Does consistency help with RBM training? , 2018, Pattern Recognit. Lett..

[24]  Kevin Schmidt,et al.  A path integral ground state method , 2000 .

[25]  Jiancheng Lv,et al.  Finding a good initial configuration of parameters for restricted Boltzmann machine pre-training , 2016, Soft Computing.