Entropy-based adaptive Hamiltonian Monte Carlo

Hamiltonian Monte Carlo (HMC) is a popular Markov Chain Monte Carlo (MCMC) algorithm to sample from an unnormalized probability distribution. A leapfrog integrator is commonly used to implement HMC in practice, but its performance can be sensitive to the choice of mass matrix used therein. We develop a gradient-based algorithm that allows for the adaptation of the mass matrix by encouraging the leapfrog integrator to have high acceptance rates while also exploring all dimensions jointly. In contrast to previous work that adapt the hyperparameters of HMC using some form of expected squared jumping distance, the adaptation strategy suggested here aims to increase sampling efficiency by maximizing an approximation of the proposal entropy. We illustrate that using multiple gradients in the HMC proposal can be beneficial compared to a single gradientstep in Metropolis-adjusted Langevin proposals. Empirical evidence suggests that the adaptation method can outperform different versions of HMC schemes by adjusting the mass matrix to the geometry of the target distribution and by providing some control on the integration time.

[1]  Zoubin Ghahramani,et al.  Turing: A Language for Flexible Probabilistic Inference , 2018 .

[2]  Oren Mangoubi,et al.  Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions , 2017, 1708.07114.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Yves F. Atchad'e,et al.  On Russian Roulette Estimates for Bayesian Inference with Doubly-Intractable Likelihoods , 2013, 1306.4032.

[5]  Martin J. Wainwright,et al.  Fast mixing of Metropolized Hamiltonian Monte Carlo: Benefits of multi-step gradients , 2019, J. Mach. Learn. Res..

[6]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[7]  C. Andrieu,et al.  On the ergodicity properties of some adaptive MCMC algorithms , 2006, math/0610317.

[8]  Christophe Andrieu,et al.  A tutorial on adaptive MCMC , 2008, Stat. Comput..

[9]  Christian P. Robert,et al.  Faster Hamiltonian Monte Carlo by Learning Leapfrog Scale , 2018, ArXiv.

[10]  Joshua V. Dillon,et al.  NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport , 2019, 1903.03704.

[11]  A. Gelman,et al.  Adaptively Scaling the Metropolis Algorithm Using Expected Squared Jumped Distance , 2007 .

[12]  C. Andrieu,et al.  On the stability of some controlled Markov chains and its applications to stochastic approximation with Markovian dynamic , 2012, 1205.4181.

[13]  C. Geyer,et al.  Correction: Variable transformation to obtain geometric ergodicity in the random-walk Metropolis algorithm , 2012, 1302.6741.

[14]  Santosh S. Vempala,et al.  Optimal Convergence Rate of Hamiltonian Monte Carlo for Strongly Logconcave Distributions , 2019, APPROX-RANDOM.

[15]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[16]  É. Moulines,et al.  On the convergence of Hamiltonian Monte Carlo , 2017, 1705.00166.

[17]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[18]  Paul Vicol,et al.  Understanding and mitigating exploding inverses in invertible neural networks , 2020, AISTATS.

[19]  I. Langmore,et al.  A Condition Number for Hamiltonian Monte Carlo. , 2019, 1905.09813.

[20]  Christophe Andrieu,et al.  A general perspective on the Metropolis-Hastings kernel , 2020, 2012.14881.

[21]  J. M. Sanz-Serna,et al.  Optimal tuning of the hybrid Monte Carlo algorithm , 2010, 1001.4460.

[22]  David Duvenaud,et al.  Residual Flows for Invertible Generative Modeling , 2019, NeurIPS.

[23]  N. Shephard,et al.  Stochastic Volatility: Likelihood Inference And Comparison With Arch Models , 1996 .

[24]  Jesús María Sanz-Serna,et al.  Geometric integrators and the Hamiltonian Monte Carlo method , 2017, Acta Numerica.

[25]  James Ridgway,et al.  Leave Pima Indians alone: binary regression as a benchmark for Bayesian computation , 2015, 1506.08640.

[26]  G. Roberts,et al.  Kinetic energy choice in Hamiltonian/hybrid Monte Carlo , 2017, Biometrika.

[27]  Jascha Sohl-Dickstein,et al.  Generalizing Hamiltonian Monte Carlo with Neural Networks , 2017, ICLR.

[28]  Jinwoo Shin,et al.  Stochastic Chebyshev Gradient Descent for Spectral Optimization , 2018, NeurIPS.

[29]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[30]  M. Betancourt,et al.  On the geometric ergodicity of Hamiltonian Monte Carlo , 2016, Bernoulli.

[31]  Michael Betancourt,et al.  A Conceptual Introduction to Hamiltonian Monte Carlo , 2017, 1701.02434.

[32]  J. Rosenthal,et al.  Coupling and Ergodicity of Adaptive Markov Chain Monte Carlo Algorithms , 2007, Journal of Applied Probability.

[33]  Peter E. Rossi,et al.  Bayesian Analysis of Stochastic Volatility Models , 1994 .

[34]  Jinglai Li,et al.  Maximum Conditional Entropy Hamiltonian Monte Carlo Sampler , 2019, SIAM J. Sci. Comput..

[35]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[36]  Tor Erlend Fjelde,et al.  Couplings for Multinomial Hamiltonian Monte Carlo , 2021, AISTATS.

[37]  Jeffrey S. Rosenthal,et al.  Sampling by Divergence Minimization , 2021 .

[38]  Joonha Park,et al.  Markov chain Monte Carlo algorithms with sequential proposals , 2019, Stat. Comput..

[39]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[40]  J. M. Sanz-Serna,et al.  Hybrid Monte Carlo on Hilbert spaces , 2011 .

[41]  James Hensman,et al.  Banded Matrix Operators for Gaussian Markov Models in the Automatic Differentiation Era , 2019, AISTATS.

[42]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[43]  Jascha Sohl-Dickstein,et al.  Hamiltonian Monte Carlo Without Detailed Balance , 2014, ICML.

[44]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[45]  Petros Dellaportas,et al.  Gradient-based Adaptive Markov Chain Monte Carlo , 2019, NeurIPS.

[46]  E. Hairer,et al.  Geometric numerical integration illustrated by the Störmer–Verlet method , 2003, Acta Numerica.

[47]  Nando de Freitas,et al.  Adaptive Hamiltonian and Riemann Manifold Monte Carlo , 2013, ICML.

[48]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[49]  Aki Vehtari,et al.  Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (with Discussion) , 2019, Bayesian Analysis.

[50]  A. Horowitz A generalized guided Monte Carlo algorithm , 1991 .

[51]  Anthony L. Caterini,et al.  Relaxing Bijectivity Constraints with Continuously Indexed Normalising Flows , 2019, ICML.

[52]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[53]  Junpeng Lao,et al.  tfp.mcmc: Modern Markov Chain Monte Carlo Tools Built for Modern Hardware , 2020, ArXiv.

[54]  Zengyi Li,et al.  A Neural Network MCMC Sampler That Maximizes Proposal Entropy , 2020, Entropy.

[55]  Vahid Tarokh,et al.  Semi-Empirical Objective Functions for MCMC Proposal Optimization , 2021, ArXiv.

[56]  Kai Xu,et al.  AdvancedHMC.jl: A robust, modular and e cient implementation of advanced HMC algorithms , 2019, AABI.

[57]  J. Heng,et al.  Unbiased Hamiltonian Monte Carlo with couplings , 2017, Biometrika.

[58]  Heikki Haario,et al.  Adaptive proposal distribution for random walk Metropolis algorithm , 1999, Comput. Stat..