Estimating Convergence of Markov chains with L-Lag Couplings

Markov chain Monte Carlo (MCMC) methods generate samples that are asymptotically distributed from a target distribution of interest as the number of iterations goes to infinity. Various theoretical results provide upper bounds on the distance between the target and marginal distribution after a fixed number of iterations. These upper bounds are on a case by case basis and typically involve intractable quantities, which limits their use for practitioners. We introduce L-lag couplings to generate computable, non-asymptotic upper bound estimates for the total variation or the Wasserstein distance of general Markov chains. We apply L-lag couplings to the tasks of (i) determining MCMC burn-in, (ii) comparing different MCMC algorithms with the same target, and (iii) comparing exact and approximate MCMC. Lastly, we (iv) assess the bias of sequential Monte Carlo and self-normalized importance samplers.

[1]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[2]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[3]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[4]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[5]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[6]  É. Moulines,et al.  Subgeometric rates of convergence in Wasserstein distance for Markov chains , 2014, 1402.4577.

[7]  Christopher Yau,et al.  The Hamming Ball Sampler , 2015, Journal of the American Statistical Association.

[8]  James M. Flegal,et al.  Multivariate output analysis for Markov chain Monte Carlo , 2015, Biometrika.

[9]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[10]  Arnaud Doucet,et al.  Non-Reversible Parallel Tempering: an Embarassingly Parallel MCMC Scheme , 2019 .

[11]  J. Rosenthal,et al.  General state space Markov chains and MCMC algorithms , 2004, math/0404033.

[12]  John Geweke,et al.  Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments , 1991 .

[13]  N. Whiteley Sequential Monte Carlo Samplers: Error Bounds and Insensitivity to Initial Conditions , 2011, 1103.3970.

[14]  Arnaud Doucet,et al.  Unbiased Smoothing using Particle Independent Metropolis-Hastings , 2019, AISTATS.

[15]  Daniel M. Roy,et al.  Sequential Monte Carlo as approximate sampling: bounds, adaptive resampling via $\infty$-ESS, and an application to particle Gibbs , 2019, Bernoulli.

[16]  D. Rubin,et al.  A Single Series from the Gibbs Sampler Provides a False Sense of Security * , 2008 .

[17]  P. Diaconis,et al.  The sample size required in importance sampling , 2015, 1511.01437.

[18]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[19]  J. N. Corcoran,et al.  Perfect sampling from independent Metropolis-Hastings chains☆ , 2002 .

[20]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[21]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[22]  H. Thorisson On Maximal and Distributional Coupling , 1986 .

[23]  Anirban Bhattacharya,et al.  Scalable MCMC for Bayes Shrinkage Priors , 2017 .

[24]  Oren Mangoubi,et al.  Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions , 2017, 1708.07114.

[25]  Fredrik Lindsten,et al.  Smoothing With Couplings of Conditional Particle Filters , 2017, Journal of the American Statistical Association.

[26]  D. Woodard,et al.  Conditions for Rapid and Torpid Mixing of Parallel and Simulated Tempering on Multimodal Distributions , 2009, 0906.2341.

[27]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[28]  Stephen P. Brooks,et al.  Assessing Convergence of Markov Chain Monte Carlo Algorithms , 2007 .

[29]  V. Johnson A Coupling-Regeneration Scheme for Diagnosing Convergence in Markov Chain Monte Carlo Algorithms , 1998 .

[30]  A. Collevecchio,et al.  On the Coupling Time of the Heat-Bath Process for the Fortuin–Kasteleyn Random–Cluster Model , 2017, 1705.07189.

[31]  Radford M. Neal Bayesian Learning via Stochastic Dynamics , 1992, NIPS.

[32]  Brian Jefferies Feynman-Kac Formulae , 1996 .

[33]  H. Thorisson Coupling, stationarity, and regeneration , 2000 .

[34]  Giacomo Zanella,et al.  Informed Proposals for Local MCMC in Discrete Spaces , 2017, Journal of the American Statistical Association.

[35]  A. Doucet,et al.  A lognormal central limit theorem for particle approximations of normalizing constants , 2013, 1307.0181.

[36]  O. Papaspiliopoulos,et al.  Importance Sampling: Intrinsic Dimension and Computational Cost , 2015, 1511.06196.

[37]  V. Johnson Studying Convergence of Markov Chain Monte Carlo Algorithms Using Coupled Sample Paths , 1996 .

[38]  Sumeetpal S. Singh,et al.  On particle Gibbs sampling , 2013, 1304.1887.

[39]  A. Eberle,et al.  Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.

[40]  Anthony Lee,et al.  ‘Variance estimation in the particle filter’ , 2015, Biometrika.

[41]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[42]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[43]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[44]  Gert R. G. Lanckriet,et al.  On the empirical estimation of integral probability metrics , 2012 .

[45]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[46]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[47]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[48]  Arnaud Doucet,et al.  Unbiased Markov chain Monte Carlo for intractable target distributions , 2020, Electronic Journal of Statistics.

[49]  Elchanan Mossel,et al.  Exact thresholds for Ising–Gibbs samplers on general graphs , 2009, The Annals of Probability.

[50]  Peter W. Glynn,et al.  Exact estimation for Markov chain equilibrium expectations , 2014, Journal of Applied Probability.

[51]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[52]  David Bruce Wilson,et al.  Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996, Random Struct. Algorithms.

[53]  J. Heng,et al.  Unbiased Hamiltonian Monte Carlo with couplings , 2017, Biometrika.

[54]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[55]  Christophe Andrieu,et al.  Uniform ergodicity of the iterated conditional SMC and geometric ergodicity of particle Gibbs samplers , 2013, 1312.6432.

[56]  Matti Vihola,et al.  Unbiased Estimators and Multilevel Monte Carlo , 2015, Oper. Res..

[57]  John O'Leary,et al.  Unbiased Markov chain Monte Carlo with couplings , 2017, 1708.03625.

[58]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[59]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[60]  D. Woodard,et al.  Conditions for Torpid Mixing of Parallel and Simulated Tempering on Multimodal Distributions , 2022 .

[61]  Jeffrey S. Rosenthal,et al.  Analysis of the Gibbs Sampler for a Model Related to James-stein Estimators , 2007 .

[62]  Mary Kathryn Cowles,et al.  A simulation approach to convergence rates for Markov chain Monte Carlo algorithms , 1998, Stat. Comput..

[63]  D. Rudolf,et al.  Perturbation theory for Markov chains via Wasserstein distance , 2015, Bernoulli.

[64]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo in Practice: A Roundtable Discussion , 1998 .