Optimal design of the Barker proposal and other locally-balanced Metropolis-Hastings algorithms

We study the class of first-order locally-balanced Metropolis–Hastings algorithms introduced in [9]. To choose a specific algorithm within the class the user must select a balancing function g : R → R satisfying g(t) = tg(1/t), and a noise distribution for the proposal increment. Popular choices within the class are the Metropolis-adjusted Langevin algorithm and the recently introduced Barker proposal. We first establish a universal limiting optimal acceptance rate of 57% and scaling of n−1/3 as the dimension n tends to infinity among all members of the class under mild smoothness assumptions on g and when the target distribution for the algorithm is of the product form. In particular we obtain an explicit expression for the asymptotic efficiency of an arbitrary algorithm in the class, as measured by expected squared jumping distance. We then consider how to optimise this expression under various constraints. We derive an optimal choice of noise distribution for the Barker proposal, optimal choice of balancing function under a Gaussian noise distribution, and optimal choice of first-order locally-balanced algorithm among the entire class, which turns out to depend on the specific target distribution. Numerical simulations confirm our theoretical findings and in particular show that a bi-modal choice of noise distribution in the Barker proposal gives rise to a practical algorithm that is consistently more efficient than the original Gaussian version.

[1]  D. Dittmar Slice Sampling , 2000 .

[2]  Giacomo Zanella,et al.  Informed Proposals for Local MCMC in Discrete Spaces , 2017, Journal of the American Statistical Association.

[3]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[4]  P. Fearnhead,et al.  The Random Walk Metropolis: Linking Theory and Practice Through a Case Study , 2010, 1011.6217.

[5]  Paul Fearnhead,et al.  Piecewise Deterministic Markov Processes for Continuous-Time Monte Carlo , 2016, Statistical Science.

[6]  Wilfrid S. Kendall,et al.  A Dirichlet Form approach to MCMC Optimal Scaling , 2016 .

[7]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[8]  Samuel Power,et al.  Accelerated Sampling on Discrete Spaces with Non-Reversible Markov Processes , 2019 .

[9]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[10]  Carlos E. Rodríguez,et al.  Searching for efficient Markov chain Monte Carlo proposal kernels , 2013, Proceedings of the National Academy of Sciences.

[11]  Jeffrey S. Rosenthal,et al.  AMCMC: An R interface for adaptive MCMC , 2007, Comput. Stat. Data Anal..

[12]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[13]  A. Barker Monte Carlo calculations of the radial distribution functions for a proton-electron plasma , 1965 .

[14]  A. Gelman,et al.  Weak convergence and optimal scaling of random walk Metropolis algorithms , 1997 .

[15]  Christophe Andrieu,et al.  A tutorial on adaptive MCMC , 2008, Stat. Comput..

[16]  Gareth O. Roberts,et al.  Examples of Adaptive MCMC , 2009 .

[18]  G. Roberts,et al.  Optimal Scaling of Random Walk Metropolis Algorithms with Non-Gaussian Proposals , 2011 .

[19]  Michael C.H. Choi,et al.  Metropolis–Hastings reversiblizations of non-reversible Markov chains , 2017, Stochastic Processes and their Applications.

[20]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[21]  J. Rosenthal,et al.  Optimal scaling of discrete approximations to Langevin diffusions , 1998 .

[22]  Giacomo Zanella,et al.  The Barker proposal: combining robustness and efficiency in gradient-based MCMC , 2019 .

[23]  L. Tierney A note on Metropolis-Hastings kernels for general state spaces , 1998 .