Quantifying the accuracy of approximate diffusions and Markov chains

Markov chains and diffusion processes are indispensable tools in machine learning and statistics that are used for inference, sampling, and modeling. With the growth of large-scale datasets, the computational cost associated with simulating these stochastic processes can be considerable, and many algorithms have been proposed to approximate the underlying Markov chain or diffusion. A fundamental question is how the computational savings trade off against the statistical error incurred due to approximations. This paper develops general results that address this question. We bound the Wasserstein distance between the equilibrium distributions of two diffusions as a function of their mixing rates and the deviation in their drifts. We show that this error bound is tight in simple Gaussian settings. Our general result on continuous diffusions can be discretized to provide insights into the computational-statistical trade-off of Markov chains. As an illustration, we apply our framework to derive finite-sample error bounds of approximate unadjusted Langevin dynamics. We characterize computation-constrained settings where, by using fast-to-compute approximate gradients in the Langevin dynamics, we obtain more accurate samples compared to using the exact gradients. Finally, as an additional application of our approach, we quantify the accuracy of approximate zig-zag sampling. Our theoretical analyses are supported by simulation experiments.

[1]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[2]  Hiroshi Tanaka Stochastic differential equations with reflecting boundary condition in convex regions , 1979 .

[3]  Mark H. A. Davis Piecewise‐Deterministic Markov Processes: A General Class of Non‐Diffusion Stochastic Models , 1984 .

[4]  A. Barbour Stein's method for diffusion approximations , 1990 .

[5]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[6]  C. Geyer On Non-reversible Markov Chains , 2000 .

[7]  Radford M. Neal Improving Asymptotic Variance of MCMC Estimators: Non-reversible Chains are Better , 2004, math/0407281.

[8]  S. Ethier,et al.  Markov Processes: Characterization and Convergence , 2005 .

[9]  Oswaldo Luiz V. Costa,et al.  Stability and ergodicity of piecewise deterministic Markov processes , 2008, 2008 47th IEEE Conference on Decision and Control.

[10]  Jonathan C. Mattingly,et al.  Asymptotic coupling and a general form of Harris’ theorem with applications to stochastic delay equations , 2009, 0902.4495.

[11]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[12]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[13]  Nathan Ross Fundamentals of Stein's method , 2011, 1109.1880.

[14]  S. Glotzer,et al.  Time-course gait analysis of hemiparkinsonian rats following 6-hydroxydopamine lesion , 2004, Behavioural Brain Research.

[15]  F. Malrieu,et al.  Quantitative Estimates for the Long-Time Behavior of an Ergodic Variant of the Telegraph Process , 2010, Advances in Applied Probability.

[16]  M. Benaim,et al.  Quantitative ergodicity for some switched dynamical systems , 2012, 1204.1922.

[17]  M. Ledoux,et al.  Analysis and Geometry of Markov Diffusion Operators , 2013 .

[18]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[19]  Alexandre Genadot,et al.  Piecewise deterministic Markov process - recent results , 2013, 1309.6061.

[20]  N. Pillai,et al.  Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets , 2014, 1405.0182.

[21]  Arnaud Doucet,et al.  Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[22]  O. Butkovsky Subgeometric rates of convergence of Markov processes in the Wasserstein metric , 2012, 1211.4273.

[23]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[24]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[25]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[26]  S. Mukherjee,et al.  Approximations of Markov Chains and Bayesian Inference , 2015 .

[27]  R. Kohn,et al.  Scalable MCMC for Large Data Problems Using Data Subsampling and the Difference Estimator , 2015, 1507.02971.

[28]  Sébastien Bubeck,et al.  Finite-Time Analysis of Projected Langevin Monte Carlo , 2015, NIPS.

[29]  G. Roberts,et al.  A piecewise deterministic scaling limit of Lifted Metropolis-Hastings in the Curie-Weiss model , 2015, 1509.00302.

[30]  Pierre Monmarché On H 1 and entropic convergence for contractive PDMP , 2015 .

[31]  Lester W. Mackey,et al.  Multivariate Stein Factors for Strongly Log-concave Distributions , 2015 .

[32]  K. Zygalakis,et al.  (Non-) asymptotic properties of Stochastic Gradient Langevin Dynamics , 2015, 1501.00438.

[33]  S. Mukherjee,et al.  Approximations of Markov Chains and High-Dimensional Bayesian Inference , 2015 .

[34]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[35]  James Zou,et al.  Rich Component Analysis , 2015, ICML.

[36]  A. Eberle Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[37]  Jian Wang $L^p$-Wasserstein distance for stochastic differential equations driven by L\'{e}vy processes , 2016, 1603.05484.

[38]  É. Moulines,et al.  Sampling from a strongly log-concave distribution with the Unadjusted Langevin Algorithm , 2016 .

[39]  Pierre Alquier,et al.  Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels , 2014, Statistics and Computing.

[40]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[41]  A. Duncan,et al.  Limit theorems for the zig-zag process , 2016, Advances in Applied Probability.

[42]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[43]  D. Rudolf,et al.  Perturbation theory for Markov chains via Wasserstein distance , 2015, Bernoulli.

[44]  P. Fearnhead,et al.  The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data , 2016, The Annals of Statistics.