Adaptive Path Sampling in Metastable Posterior Distributions

The normalizing constant plays an important role in Bayesian computation, and there is a large literature on methods for computing or approximating normalizing constants that cannot be evaluated in closed form. When the normalizing constant varies by orders of magnitude, methods based on importance sampling can require many rounds of tuning. We present an improved approach using adaptive path sampling, iteratively reducing gaps between the base and target. Using this adaptive strategy, we develop two metastable sampling schemes. They are automated in Stan and require little tuning. For a multimodal posterior density, we equip simulated tempering with a continuous temperature. For a funnel-shaped entropic barrier, we adaptively increase mass in bottleneck regions to form an implicit divide-and-conquer. Both approaches empirically perform better than existing methods for sampling from metastable distributions, including higher accuracy and computation efficiency.

[1]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[2]  Aki Vehtari,et al.  Sparsity information and regularization in the horseshoe and other shrinkage priors , 2017, 1707.01694.

[3]  A. Owen,et al.  Safe and Effective Importance Sampling , 2000 .

[4]  Y. Ogata A Monte Carlo method for high dimensional integration , 1989 .

[5]  Dana Randall,et al.  Torpid mixing of simulated tempering on the Potts model , 2004, SODA '04.

[6]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[7]  C. Geyer,et al.  Annealing Markov chain Monte Carlo with applications to ancestral inference , 1995 .

[8]  Aki Vehtari,et al.  Pushing the Limits of Importance Sampling through Iterative Moment Matching , 2019 .

[9]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[10]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[11]  J. Kirkwood Statistical Mechanics of Fluid Mixtures , 1935 .

[12]  Pierre E. Jacob,et al.  Unbiased estimation of log normalizing constants with applications to Bayesian cross-validation , 2018, 1810.01382.

[13]  J. Geweke,et al.  Bayesian Inference in Econometric Models Using Monte Carlo Integration , 1989 .

[14]  Aki Vehtari,et al.  Yes, but Did It Work?: Evaluating Variational Inference , 2018, ICML.

[15]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .

[16]  H. Robbins A Stochastic Approximation Method , 1951 .

[17]  Arnaud Blondel,et al.  Ensemble variance in free energy calculations by thermodynamic integration: Theory, optimal “Alchemical” path, and practical solutions , 2004, J. Comput. Chem..

[18]  A. Gelman,et al.  Pareto Smoothed Importance Sampling , 2015, 1507.02646.

[19]  Gianpaolo Gobbo,et al.  Extended Hamiltonian approach to continuous tempering. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Michael Betancourt,et al.  A Conceptual Introduction to Hamiltonian Monte Carlo , 2017, 1701.02434.

[21]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[22]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[23]  M. Betancourt,et al.  Adiabatic Monte Carlo , 2014 .

[24]  Michael R. Shirts,et al.  Statistically optimal analysis of samples from multiple equilibrium states. , 2008, The Journal of chemical physics.

[25]  Christophe Andrieu,et al.  Kernel Adaptive Metropolis-Hastings , 2014, ICML.

[26]  P. Diaconis,et al.  The sample size required in importance sampling , 2015, 1511.01437.

[27]  Gabriel Stoltz,et al.  Free energy computation: a mathematical perspective. , 2010 .

[28]  Aki Vehtari,et al.  On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior , 2016, AISTATS.

[29]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998, J. Comput. Chem..

[30]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[31]  Aki Vehtari,et al.  Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors , 2020 .

[32]  Petar M. Djuric,et al.  Adaptive Importance Sampling: The past, the present, and the future , 2017, IEEE Signal Processing Magazine.

[33]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[34]  A. Gelman,et al.  Rank-normalization, folding, and localization: An improved R-hat for assessing convergence Rank-Normalization, Folding, and Localization: An Improved (cid:2) R for Assessing Convergence of MCMC An assessing for assessing An improved (cid:2) R for assessing convergence of MCMC , 2020 .

[35]  G. Parisi,et al.  Simulated tempering: a new Monte Carlo scheme , 1992, hep-lat/9205018.

[36]  Jürgen Schlitter,et al.  Methods for Minimizing Errors in Linear Thermodynamic Integration , 1991 .

[37]  Christopher Nemeth,et al.  Pseudo-Extended Markov chain Monte Carlo , 2017, NeurIPS.

[38]  Aaron Smith,et al.  Does Hamiltonian Monte Carlo mix faster than a random walk on multimodal densities? , 2018, ArXiv.

[39]  Neal Madras,et al.  On the swapping algorithm , 2003, Random Struct. Algorithms.

[40]  Amos Storkey,et al.  Continuously Tempered Hamiltonian Monte Carlo , 2017, UAI.

[41]  C. Jarzynski Nonequilibrium Equality for Free Energy Differences , 1996, cond-mat/9610209.

[42]  Matthew D. Hoffman,et al.  Automatic Reparameterisation of Probabilistic Programs , 2019, ICML.

[43]  D. Woodard,et al.  Conditions for Torpid Mixing of Parallel and Simulated Tempering on Multimodal Distributions , 2022 .