The Barker proposal: combining robustness and efficiency in gradient-based MCMC

There is a tension between robustness and efficiency when designing Markov chain Monte Carlo (MCMC) sampling algorithms. Here we focus on robustness with respect to tuning parameters, showing that more sophisticated algorithms tend to be more sensitive to the choice of step-size parameter and less robust to heterogeneity of the distribution of interest. We characterise this phenomenon by studying the behaviour of spectral gaps as an increasingly poor step-size is chosen for the algorithm. Motivated by these considerations, we propose a novel and simple gradient-based MCMC algorithm, inspired by the classical Barker accept-reject rule, with improved robustness properties. Extensive theoretical results, dealing with robustness to tuning, geometric ergodicity and scaling with dimension, suggest that the novel scheme combines the robustness of simple schemes with the efficiency of gradient-based ones. We show numerically that this type of robustness is particularly beneficial in the context of adaptive MCMC, giving examples where our proposed scheme significantly outperforms state-of-the-art alternatives.

[1]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[2]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[3]  Paul Fearnhead,et al.  Piecewise Deterministic Markov Processes for Continuous-Time Monte Carlo , 2016, Statistical Science.

[4]  D. Woodard,et al.  Sufficient Conditions for Torpid Mixing of Parallel and Simulated Tempering , 2009 .

[5]  W. Gautschi Some Elementary Inequalities Relating to the Gamma and Incomplete Gamma Function , 1959 .

[6]  J. M. Sanz-Serna,et al.  Optimal tuning of the hybrid Monte Carlo algorithm , 2010, 1001.4460.

[7]  É. Moulines,et al.  On the convergence of Hamiltonian Monte Carlo , 2017, 1705.00166.

[8]  Samuel Power,et al.  Accelerated Sampling on Discrete Spaces with Non-Reversible Markov Processes , 2019 .

[9]  Alexandros Beskos,et al.  Asymptotic analysis of the random walk Metropolis algorithm on ridged densities , 2015, The Annals of Applied Probability.

[10]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[11]  L. Tierney A note on Metropolis-Hastings kernels for general state spaces , 1998 .

[12]  Jeffrey S. Rosenthal,et al.  Asymptotic Variance and Convergence Rates of Nearly-Periodic Markov Chain Monte Carlo Algorithms , 2003 .

[13]  M. Betancourt,et al.  On the geometric ergodicity of Hamiltonian Monte Carlo , 2016, Bernoulli.

[14]  J. Rosenthal,et al.  Optimal scaling of discrete approximations to Langevin diffusions , 1998 .

[15]  S. F. Jarner,et al.  Geometric ergodicity of Metropolis algorithms , 2000 .

[16]  J. Rosenthal,et al.  General state space Markov chains and MCMC algorithms , 2004, math/0404033.

[17]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[18]  Andrew M. Stuart,et al.  Inverse problems: A Bayesian perspective , 2010, Acta Numerica.

[19]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[20]  G. Roberts,et al.  Fast Langevin based algorithm for MCMC in high dimensions , 2015, 1507.02166.

[21]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[22]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[23]  A. Gelman,et al.  Weak convergence and optimal scaling of random walk Metropolis algorithms , 1997 .

[24]  É. Moulines,et al.  The tamed unadjusted Langevin algorithm , 2017, Stochastic Processes and their Applications.

[25]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[26]  Giacomo Zanella,et al.  Informed Proposals for Local MCMC in Discrete Spaces , 2017, Journal of the American Statistical Association.

[27]  G. Roberts,et al.  Analysis of the Gibbs Sampler for Gaussian hierarchical models via multigrid decomposition , 2017, 1703.06098.

[28]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[29]  Jonathan C. Mattingly,et al.  SPDE limits of the random walk Metropolis algorithm in high dimensions , 2009 .

[30]  P. Peskun,et al.  Optimum Monte-Carlo sampling using Markov chains , 1973 .

[31]  Gareth O. Roberts,et al.  Complexity bounds for Markov chain Monte Carlo algorithms via diffusion limits , 2016, Journal of Applied Probability.

[32]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[33]  Gareth O. Roberts,et al.  Scalable inference for crossed random effects models , 2018, Biometrika.

[34]  Horst Alzer,et al.  On some inequalities for the incomplete gamma function , 1997, Math. Comput..

[35]  A. Azzalini A class of distributions which includes the normal ones , 1985 .

[36]  A. Barker Monte Carlo calculations of the radial distribution functions for a proton-electron plasma , 1965 .

[37]  Y. Atchadé An Adaptive Version for the Metropolis Adjusted Langevin Algorithm with a Truncated Drift , 2006 .

[38]  Jonathan C. Mattingly,et al.  Diffusion limits of the random walk metropolis algorithm in high dimensions , 2010, 1003.4306.

[39]  Christophe Andrieu,et al.  A tutorial on adaptive MCMC , 2008, Stat. Comput..

[40]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .