Error bounds for Approximations of Markov chains

The first part of this article gives error bounds for approximations of Markov kernels under Foster-Lyapunov conditions. The basic idea is that when both the approximating kernel and the original kernel satisfy a Foster-Lyapunov condition, the long-time dynamics of the two chains -- as well as the invariant measures, when they exist -- will be close in a weighted total variation norm, provided that the approximation is sufficiently accurate. The required accuracy depends in part on the Lyapunov function, with more stable chains being more tolerant of approximation error. We are motivated by the recent growth in proposals for scaling Markov chain Monte Carlo algorithms to large datasets by defining an approximating kernel that is faster to sample from. Many of these proposals use only a small subset of the data points to construct the transition kernel, and we consider an application to this class of approximating kernel. We also consider applications to distribution approximations in Gibbs sampling. Another application in which approximating kernels are commonly used is in Metropolis algorithms for Gaussian process models common in spatial statistics and nonparametric regression. In this setting, there are typically two sources of approximation error: discretization error and approximation of Metropolis acceptance ratios. Because the approximating kernel is obtained by discretizing the state space, it is singular with respect to the exact kernel. To analyze this application, we give additional results in Wasserstein metrics in contrast to the proceeding examples which quantified the level of approximation in a total variation norm.

[1]  David B. Dunson,et al.  Scalable and Robust Bayesian Inference via the Median Posterior , 2014, ICML.

[2]  Pierre Alquier,et al.  Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels , 2014, Statistics and Computing.

[3]  Jonathan C. Mattingly,et al.  Coupling and Decoupling to bound an approximating Markov Chain , 2017 .

[4]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[5]  Aaron Smith,et al.  MCMC for Imbalanced Categorical Data , 2016, Journal of the American Statistical Association.

[6]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[7]  Andrew M. Stuart,et al.  Convergence of Numerical Time-Averaging and Stationary Measures via Poisson Equations , 2009, SIAM J. Numer. Anal..

[8]  R. Tweedie,et al.  Rates of convergence of the Hastings and Metropolis algorithms , 1996 .

[9]  Arnaud Doucet,et al.  Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[10]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[11]  Jonathan C. Mattingly,et al.  Yet Another Look at Harris’ Ergodic Theorem for Markov Chains , 2008, 0810.2777.

[12]  H. Haario,et al.  An adaptive Metropolis algorithm , 2001 .

[13]  S. Meyn,et al.  Geometric ergodicity and the spectral gap of non-reversible Markov chains , 2009, 0906.5322.

[14]  Jonathan C. Mattingly,et al.  Asymptotic coupling and a general form of Harris’ theorem with applications to stochastic delay equations , 2009, 0902.4495.

[15]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[16]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[17]  J. Wellner,et al.  Exponential bounds for the hypergeometric distribution. , 2015, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.

[18]  S. Meyn,et al.  Spectral theory and limit theorems for geometrically ergodic Markov processes , 2002, math/0209200.

[19]  Sean P. Meyn,et al.  A Liapounov bound for solutions of the Poisson equation , 1996 .

[20]  Jonathan C. Mattingly,et al.  Ergodicity of the 2D Navier-Stokes equations with degenerate stochastic forcing , 2004, math/0406087.

[21]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[22]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[23]  A. Y. Mitrophanov,et al.  Sensitivity and convergence of uniformly ergodic Markov chains , 2005 .

[24]  D. Rudolf,et al.  Perturbation theory for Markov chains via Wasserstein distance , 2015, Bernoulli.

[25]  N. Pillai,et al.  Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets , 2014, 1405.0182.

[26]  Volkan Cevher,et al.  WASP: Scalable Bayes via barycenters of subset posteriors , 2015, AISTATS.