Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets

In many modern applications, difficulty in evaluating the posterior density makes performing even a single MCMC step slow. This difficulty can be caused by intractable likelihood functions, but also appears for routine problems with large data sets. Many researchers have responded by running approximate versions of MCMC algorithms. In this note, we develop quantitative bounds for showing the ergodicity of these approximate samplers. We then use these bounds to study the bias-variance trade-off of approximate MCMC algorithms. We apply our results to simple versions of recently proposed algorithms, including a variant of the "austerity" framework of Korratikara et al.

[1]  David J. Aldous,et al.  Lower bounds for covering times for reversible Markov chains and random walks on graphs , 1989 .

[2]  J. Rosenthal Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo , 1995 .

[3]  R. Tweedie,et al.  Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms , 1996 .

[4]  D. Balding,et al.  Analyses of infectious disease data from household outbreaks by Markov chain Monte Carlo methods , 2000 .

[5]  Q. Shao,et al.  Gaussian processes: Inequalities, small ball probabilities and applications , 2001 .

[6]  G. Roberts,et al.  Langevin Diffusions and Metropolis-Hastings Algorithms , 2002 .

[7]  M. Beaumont Estimation of population growth or decline in genetically monitored populations. , 2003, Genetics.

[8]  A. Y. Mitrophanov,et al.  Sensitivity and convergence of uniformly ergodic Markov chains , 2005 .

[9]  V. Sós,et al.  Counting Graph Homomorphisms , 2006 .

[10]  Y. Ollivier Ricci curvature of Markov chains on metric spaces , 2007, math/0701886.

[11]  C. Villani Optimal Transport: Old and New , 2008 .

[12]  Gareth O. Roberts,et al.  Variance bounding Markov chains. , 2008, 0806.2747.

[13]  Arnak S. Dalalyan,et al.  Sparse Regression Learning by Aggregation and Langevin Monte-Carlo , 2009, COLT.

[14]  C. Andrieu,et al.  The pseudo-marginal approach for efficient Monte Carlo computations , 2009, 0903.5480.

[15]  M. Cule,et al.  Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density , 2009, 0908.4400.

[16]  E. Vanden-Eijnden,et al.  Non-asymptotic mixing of the MALA algorithm , 2010, 1008.3514.

[17]  Y. Ollivier,et al.  CURVATURE, CONCENTRATION AND ERROR ESTIMATES FOR MARKOV CHAIN MONTE CARLO , 2009, 0904.1312.

[18]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[19]  Jeffrey S. Rosenthal,et al.  Optimal Proposal Distributions and Adaptive MCMC , 2011 .

[20]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[21]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[22]  D. Paulin Concentration inequalities for Markov chains by Marton couplings and spectral methods , 2012, 1212.2015.

[23]  A. Doucet,et al.  Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator , 2012, 1210.1871.

[24]  P. Diaconis,et al.  Estimating and understanding exponential random graph models , 2011, 1102.2650.

[25]  Regular Perturbation of V-Geometrically Ergodic Markov Chains , 2013, Journal of Applied Probability.

[26]  Playing Russian Roulette with Intractable Likelihoods , 2013 .

[27]  Michael I. Jordan On statistics, computation and scalability , 2013, ArXiv.

[28]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[29]  Budget,et al.  UvA-DARE ( Digital Academic Repository ) Austerity in MCMC Land : Cutting the Metropolis-Hastings , 2013 .

[30]  Vikash K. Mansinghka,et al.  Sublinear-Time Approximate MCMC Transitions for Probabilistic Programs , 2014, 1411.1690.

[31]  Antonietta Mira,et al.  Exploiting Multi-Core Architectures for Reduced-Variance Estimation with Intractable Likelihoods , 2014, 1408.4663.

[32]  Arnaud Doucet,et al.  Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[33]  Zoubin Ghahramani,et al.  Sublinear Approximate Inference for Probabilistic Programs , 2014 .

[34]  J. Dedecker,et al.  Subgaussian concentration inequalities for geometrically ergodic Markov chains , 2014, 1412.1794.

[35]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[36]  Aryeh Kontorovich,et al.  Uniform Chernoff and Dvoretzky-Kiefer-Wolfowitz-Type Inequalities for Markov Chains and Related Processes , 2012, Journal of Applied Probability.

[37]  Jun Zhu,et al.  Big Learning with Bayesian Methods , 2014, ArXiv.

[38]  Anthony Lee,et al.  Variance bounding and geometric ergodicity of Markov chain Monte Carlo kernels for approximate Bayesian computation , 2012, 1210.6703.

[39]  Christian P. Robert,et al.  Bayesian computation: a perspective on the current state, and sampling backwards and forwards , 2015, 1502.01148.

[40]  Jonathan C. Mattingly,et al.  Optimal approximating Markov chains for Bayesian inference , 2015, 1508.03387.

[41]  Odalric-Ambrym Maillard,et al.  Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[42]  Pierre Alquier,et al.  Light and Widely Applicable MCMC: Approximate Bayesian Inference for Large Datasets , 2015, 1503.04178.

[43]  Zoubin Ghahramani,et al.  Subsampling-Based Approximate Monte Carlo for Discrete Distributions , 2015, ArXiv.

[44]  D. Sussman,et al.  Analyzing statistical and computational tradeoffs of estimation procedures , 2015, 1506.07925.

[45]  J. Rosenthal,et al.  On the efficiency of pseudo-marginal random walk Metropolis algorithms , 2013, The Annals of Statistics.

[46]  S. Mukherjee,et al.  Approximations of Markov Chains and High-Dimensional Bayesian Inference , 2015 .

[47]  Zoubin Ghahramani,et al.  Scalable Discrete Sampling as a Multi-Armed Bandit Problem , 2015, ICML.

[48]  Patrick R. Conrad,et al.  Accelerating Asymptotically Exact MCMC for Computationally Intensive Models via Local Approximations , 2014, 1402.1694.

[49]  Pierre Alquier,et al.  Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels , 2014, Statistics and Computing.

[50]  D. Rudolf,et al.  Perturbation theory for Markov chains via Wasserstein distance , 2015, Bernoulli.