Unbiased Markov chain Monte Carlo with couplings

Markov chain Monte Carlo (MCMC) methods provide consistent of integrals as the number of iterations goes to infinity. MCMC estimators are generally biased after any fixed number of iterations. We propose to remove this bias by using couplings of Markov chains together with a telescopic sum argument of Glynn and Rhee (2014). The resulting unbiased estimators can be computed independently in parallel. We discuss practical couplings for popular MCMC algorithms. We establish the theoretical validity of the proposed estimators and study their efficiency relative to the underlying MCMC algorithms. Finally, we illustrate the performance and limitations of the method on toy examples, on an Ising model around its critical temperature, on a high-dimensional variable selection problem, and on an approximation of the cut distribution arising in Bayesian inference for models made of multiple modules.

[1]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[2]  Anthony Brockwell Parallel Markov chain Monte Carlo Simulation by Pre-Fetching , 2006 .

[3]  É. Moulines,et al.  On the convergence of Hamiltonian Monte Carlo , 2017, 1705.00166.

[4]  Arnaud Doucet,et al.  On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[5]  Murali Haran,et al.  Markov chain Monte Carlo: Can we trust the third significant figure? , 2007, math/0703746.

[6]  R. Tweedie The existence of moments for stationary Markov chains , 1983, Journal of Applied Probability.

[7]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[8]  E. Mainini A description of transport cost for signed measures , 2012 .

[9]  G. Roberts,et al.  Convergence of Heavy‐tailed Monte Carlo Markov Chain Algorithms , 2007 .

[10]  Martin J. Wainwright,et al.  On the Computational Complexity of High-Dimensional Bayesian Variable Selection , 2015, ArXiv.

[11]  Art B. Owen,et al.  Statistically Efficient Thinning of a Markov Chain Sampler , 2015, ArXiv.

[12]  T. Lindvall Lectures on the Coupling Method , 1992 .

[13]  G. Casella,et al.  Explaining the Perfect Sampler , 2001 .

[14]  Wenyi Wang,et al.  Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors , 2016, Bioinform..

[15]  A. Doucet,et al.  Piecewise-Deterministic Markov Chain Monte Carlo , 2017, 1707.05296.

[16]  Sally Rosenthal,et al.  Parallel computing and Monte Carlo algorithms , 1999 .

[17]  Peter W. Glynn,et al.  Exact estimation for Markov chain equilibrium expectations , 2014, Journal of Applied Probability.

[18]  Jeffrey S. Rosenthal,et al.  Analysis of the Gibbs Sampler for a Model Related to James-stein Estimators , 2007 .

[19]  Bin Yu,et al.  Regeneration in Markov chain samplers , 1995 .

[20]  Adityanand Guntuboyina,et al.  On risk bounds in isotonic and other shape restricted regression problems , 2013, 1311.3765.

[21]  Hee Min Choi,et al.  The Polya-Gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic , 2013 .

[22]  J. Rosenthal QUANTITATIVE CONVERGENCE RATES OF MARKOV CHAINS: A SIMPLE ACCOUNT , 2002 .

[23]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[24]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[25]  J. Propp,et al.  Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996 .

[26]  P. Jacob,et al.  On nonnegative unbiased estimators , 2013, 1309.6473.

[27]  Volkan Cevher,et al.  WASP: Scalable Bayes via barycenters of subset posteriors , 2015, AISTATS.

[28]  Ryan P. Adams,et al.  Elliptical slice sampling , 2009, AISTATS.

[29]  P. Reutter General Strategies for Assessing Convergence of Mcmc Algorithms Using Coupled Sample Paths , 1995 .

[30]  R. Tweedie,et al.  Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms , 1996 .

[31]  Christian P. Robert,et al.  Bayesian computation: a summary of the current state, and samples backwards and forwards , 2015, Statistics and Computing.

[32]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[33]  Pierre E. Jacob,et al.  Unbiased estimation of log normalizing constants with applications to Bayesian cross-validation , 2018, 1810.01382.

[34]  M. Knott,et al.  On the optimal mapping of distributions , 1984 .

[35]  James M. Flegal,et al.  A Practical Sequential Stopping Rule for High-Dimensional Markov Chain Monte Carlo , 2016 .

[36]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[37]  C. Andrieu,et al.  The pseudo-marginal approach for efficient Monte Carlo computations , 2009, 0903.5480.

[38]  A. Doucet,et al.  Asymptotic bias of stochastic gradient search , 2017 .

[39]  V. Johnson Studying Convergence of Markov Chain Monte Carlo Algorithms Using Coupled Sample Paths , 1996 .

[40]  Kshitij Khare,et al.  Geometric ergodicity for Bayesian shrinkage models , 2014 .

[41]  Fredrik Lindsten,et al.  Smoothing With Couplings of Conditional Particle Filters , 2017, Journal of the American Statistical Association.

[42]  Wang,et al.  Nonuniversal critical dynamics in Monte Carlo simulations. , 1987, Physical review letters.

[43]  Mu-Chen Chen,et al.  Credit scoring with a data mining approach based on support vector machines , 2007, Expert Syst. Appl..

[44]  G. Roberts,et al.  Kinetic energy choice in Hamiltonian/hybrid Monte Carlo , 2017, Biometrika.

[45]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[46]  Beat Neuenschwander,et al.  Combining MCMC with ‘sequential’ PKPD modelling , 2009, Journal of Pharmacokinetics and Pharmacodynamics.

[47]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[48]  Jesús María Sanz-Serna,et al.  Geometric integrators and the Hamiltonian Monte Carlo method , 2017, Acta Numerica.

[49]  R. Douc,et al.  Practical drift conditions for subgeometric rates of convergence , 2004, math/0407122.

[50]  J. Heng,et al.  Unbiased Hamiltonian Monte Carlo with couplings , 2017, Biometrika.

[51]  U. Wolff Comparison Between Cluster Monte Carlo Algorithms in the Ising Model , 1989 .

[52]  Lu Ma,et al.  Blind Identification Based on Expectation-Maximization Algorithm Coupled With Blocked Rhee–Glynn Smoothing Estimator , 2018, IEEE Communications Letters.

[53]  Y. Atchadé An Adaptive Version for the Metropolis Adjusted Langevin Algorithm with a Truncated Drift , 2006 .

[54]  Esa Nummelin,et al.  MC's for MCMC'ists , 2002 .

[55]  Oren Mangoubi,et al.  Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions , 2017, 1708.07114.

[56]  Mary Kathryn Cowles,et al.  A simulation approach to convergence rates for Markov chain Monte Carlo algorithms , 1998, Stat. Comput..

[57]  Philip Heidelberger,et al.  Bias Properties of Budget Constrained Simulations , 1990, Oper. Res..

[58]  C. Villani Optimal Transport: Old and New , 2008 .

[59]  Persi Diaconis,et al.  Iterated Random Functions , 1999, SIAM Rev..

[60]  Radford M. Neal Bayesian Learning via Stochastic Dynamics , 1992, NIPS.

[61]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[62]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[63]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[64]  James O. Berger,et al.  Modularization in Bayesian analysis, with emphasis on analysis of computer models , 2009 .

[65]  Adrian Pagan,et al.  Econometric Issues in the Analysis of Regressions with Generated Regressors. , 1984 .

[66]  Lawrence C McCandless,et al.  The International Journal of Biostatistics CAUSAL INFERENCE Cutting Feedback in Bayesian Regression Adjustment for the Propensity Score , 2011 .

[67]  Yang Chen,et al.  On parallelizable Markov chain Monte Carlo algorithms with waste-recycling , 2018, Stat. Comput..

[68]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[69]  Christian P. Robert,et al.  Better together? Statistical learning in models made of modules , 2017, 1708.08719.

[70]  Ward Whitt,et al.  The Asymptotic Efficiency of Simulation Estimators , 1992, Oper. Res..

[71]  R. Douc,et al.  Quantitative bounds on convergence of time-inhomogeneous Markov chains , 2004, math/0503532.

[72]  Arnaud Doucet,et al.  Unbiased Markov chain Monte Carlo for intractable target distributions , 2020, Electronic Journal of Statistics.

[73]  Kevin M. Murphy,et al.  Estimation and Inference in Two-Step Econometric Models , 1985 .

[74]  Christophe Andrieu,et al.  A tutorial on adaptive MCMC , 2008, Stat. Comput..

[75]  Don McLeish,et al.  A general method for debiasing a Monte Carlo estimator , 2010, Monte Carlo Methods Appl..

[76]  Jeffrey S. Rosenthal,et al.  Faithful Couplings of Markov Chains , 1997 .

[77]  A. Doucet,et al.  An efficient computational approach for prior sensitivity analysis and cross‐validation , 2010 .

[78]  H. Thorisson Coupling, stationarity, and regeneration , 2000 .

[79]  C. Andrieu,et al.  Convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms , 2012, 1210.1484.

[80]  Sylvia Richardson,et al.  A Bayesian model of time activity data to investigate health effect of air pollution in time series studies , 2011 .

[81]  Xiangyu Wang,et al.  Parallelizing MCMC with Random Partition Trees , 2015, NIPS.

[82]  Jonathan C. Mattingly,et al.  Coupling and Decoupling to bound an approximating Markov Chain , 2017 .

[83]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[84]  Philip Heidelberger,et al.  Analysis of parallel replicated simulations under a completion time constraint , 1991, TOMC.

[85]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[86]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[87]  S. F. Jarner,et al.  Geometric ergodicity of Metropolis algorithms , 2000 .

[88]  Radford M. Neal,et al.  Improving Markov chain Monte Carlo Estimators by Coupling to an Approximating Chain , 2001 .

[89]  A. Kennedy,et al.  Hybrid Monte Carlo , 1988 .

[90]  P. Priouret,et al.  Bayesian Time Series Models: Adaptive Markov chain Monte Carlo: theory and methods , 2011 .

[91]  J. Kadane,et al.  Identification of Regeneration Times in MCMC Simulation, With Application to Adaptive Schemes , 2005 .

[92]  Radu Herbei,et al.  Exact sampling for intractable probability distributions via a Bernoulli factory , 2012 .

[93]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[94]  Kshitij Khare,et al.  Geometric ergodicity of the Bayesian lasso , 2013 .

[95]  P. Green,et al.  Exact Sampling from a Continuous State Space , 1998 .

[96]  Valen E Johnson On Numerical Aspects of Bayesian Model Selection in High and Ultrahigh-dimensional Settings. , 2013, Bayesian analysis.

[97]  D. Gaver,et al.  Robust empirical bayes analyses of event rates , 1987 .

[98]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[99]  James M. Flegal,et al.  Strong consistency of multivariate spectral variance estimators in Markov chain Monte Carlo , 2015, Bernoulli.

[100]  V. Johnson A Coupling-Regeneration Scheme for Diagnosing Convergence in Markov Chain Monte Carlo Algorithms , 1998 .

[101]  Radford M. Neal Circularly-Coupled Markov Chain Sampling , 2017, 1711.04399.

[102]  S. Rosenthal,et al.  Faithful couplings of Markov chains : now equals forever , 1995 .

[103]  E. Hairer,et al.  Geometric Numerical Integration: Structure Preserving Algorithms for Ordinary Differential Equations , 2004 .

[104]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[105]  M. Betancourt,et al.  The Geometric Foundations of Hamiltonian Monte Carlo , 2014, 1410.5110.

[106]  Corwin M. Zigler,et al.  The Central Role of Bayes’ Theorem for Joint Estimation of Causal Effects and Propensity Scores , 2013, The American statistician.

[107]  Martyn Plummer,et al.  International Correlation between Human Papillomavirus Prevalence and Cervical Cancer Incidence , 2008, Cancer Epidemiology Biomarkers & Prevention.

[108]  A. Gelman,et al.  Weak convergence and optimal scaling of random walk Metropolis algorithms , 1997 .

[109]  A. Eberle,et al.  Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.

[110]  Peter W. Glynn,et al.  A new approach to unbiased estimation for SDE's , 2012, Proceedings Title: Proceedings of the 2012 Winter Simulation Conference (WSC).

[111]  Jonathan R Goodman,et al.  Ensemble samplers with affine invariance , 2010 .

[112]  H. Tjelmeland,et al.  Using all Metropolis-Hastings proposals to estimate mean values , 2004 .

[113]  Christian P. Robert,et al.  Using Parallel Computation to Improve Independent Metropolis–Hastings Based Estimation , 2010, ArXiv.

[114]  Christopher Yau,et al.  The Hamming Ball Sampler , 2015, Journal of the American Statistical Association.

[115]  Ben Calderhead,et al.  A general construction for parallelizing Metropolis−Hastings algorithms , 2014, Proceedings of the National Academy of Sciences.

[116]  J. Rosenthal,et al.  General state space Markov chains and MCMC algorithms , 2004, math/0404033.

[117]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[118]  David West,et al.  Neural network credit scoring models , 2000, Comput. Oper. Res..

[119]  Martyn Plummer Cuts in Bayesian graphical models , 2015, Stat. Comput..

[120]  Daniela De Angelis,et al.  Massively parallel MCMC for Bayesian hierarchical models , 2017 .

[121]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[122]  Matti Vihola,et al.  Unbiased Estimators and Multilevel Monte Carlo , 2015, Oper. Res..