Bounding Wasserstein distance with couplings

Markov chain Monte Carlo (MCMC) provides asymptotically consistent estimates of intractable posterior expectations as the number of iterations tends to infinity. However, in large data applications, MCMC can be computationally expensive per iteration. This has catalyzed interest in sampling methods such as approximate MCMC, which trade off asymptotic consistency for improved computational speed. In this article, we propose estimators based on couplings of Markov chains to assess the quality of such asymptotically biased sampling methods. The estimators give empirical upper bounds of the Wassertein distance between the limiting distribution of the asymptotically biased sampling method and the original target distribution of interest. We establish theoretical guarantees for our upper bounds and show that our estimators can remain effective in high dimensions. We apply our quality measures to stochastic gradient MCMC, variational Bayes, and Laplace approximations for tall data and to approximate MCMC for Bayesian logistic regression in 4500 dimensions and Bayesian linear regression in 50000 dimensions.

[1]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[2]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[3]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[4]  P. Jacob,et al.  Maximal Couplings of the Metropolis-Hastings Algorithm , 2020, AISTATS.

[5]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[6]  Dootika Vats,et al.  Revisiting the Gelman–Rubin Diagnostic , 2018, Statistical Science.

[7]  M. Gelbrich On a Formula for the L2 Wasserstein Metric between Measures on Euclidean and Hilbert Spaces , 1990 .

[8]  A. Eberle,et al.  Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.

[9]  Lester W. Mackey,et al.  Stochastic Stein Discrepancies , 2020, NeurIPS.

[10]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[11]  M. West,et al.  Shotgun Stochastic Search for “Large p” Regression , 2007 .

[12]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[13]  L. Rogers,et al.  Coupling of Multidimensional Diffusions by Reflection , 1986 .

[14]  D. Rudolf,et al.  Perturbation theory for Markov chains via Wasserstein distance , 2015, Bernoulli.

[15]  Yao Li,et al.  Using coupling methods to estimate sample quality for stochastic differential equations , 2019, ArXiv.

[16]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[17]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[18]  F. Bach,et al.  Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , 2017, Bernoulli.

[19]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[20]  Yves Atchad'e,et al.  A fast asynchronous MCMC sampler for sparse Bayesian inference , 2021, 2108.06446.

[21]  V. Johnson A Coupling-Regeneration Scheme for Diagnosing Convergence in Markov Chain Monte Carlo Algorithms , 1998 .

[22]  Alain Durmus,et al.  Discrete sticky couplings of functional autoregressive processes , 2021 .

[23]  B. Mallick,et al.  Fast sampling with Gaussian scale-mixture priors in high-dimensional regression. , 2015, Biometrika.

[24]  Lester Mackey,et al.  Random Feature Stein Discrepancies , 2018, NeurIPS.

[25]  James Ridgway,et al.  Leave Pima Indians alone: binary regression as a benchmark for Bayesian computation , 2015, 1506.08640.

[26]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[27]  Arnaud Doucet,et al.  Unbiased Smoothing using Particle Independent Metropolis-Hastings , 2019, AISTATS.

[28]  Pierre E. Jacob,et al.  Estimating Convergence of Markov chains with L-Lag Couplings , 2019, NeurIPS.

[29]  Andrew W. Moore,et al.  Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs , 2003, AISTATS.

[30]  A. Eberle Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[31]  E. Vanden-Eijnden,et al.  Non-asymptotic mixing of the MALA algorithm , 2010, 1008.3514.

[32]  Mateusz B. Majka,et al.  Quantitative contraction rates for Markov chains on general state spaces , 2018, Electronic Journal of Probability.

[33]  P. Jacob,et al.  Unbiased Markov chain Monte Carlo methods with couplings , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[34]  Trevor Campbell,et al.  Coresets for Scalable Bayesian Logistic Regression , 2016, NIPS.

[35]  Aki Vehtari,et al.  Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (with Discussion) , 2019, Bayesian Analysis.

[36]  Alexander Gasnikov,et al.  Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm , 2018, ICML.

[37]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[38]  Anirban Bhattacharya,et al.  Scalable Approximate MCMC Algorithms for the Horseshoe Prior , 2020, J. Mach. Learn. Res..

[39]  N. Narisetty,et al.  Bayesian variable selection with shrinking and diffusing priors , 2014, 1405.6545.

[40]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[41]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[42]  F. Liang,et al.  Bayesian Subset Modeling for High-Dimensional Generalized Linear Models , 2013 .

[43]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[44]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[45]  Jing Lei Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces , 2018, Bernoulli.

[46]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[47]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[48]  Peter Bühlmann,et al.  High-Dimensional Statistics with a View Toward Applications in Biology , 2014 .

[49]  Peter W. Glynn,et al.  Exact estimation for Markov chain equilibrium expectations , 2014, Journal of Applied Probability.

[50]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[51]  Jonathan C. Mattingly,et al.  Error bounds for Approximations of Markov chains used in Bayesian Sampling , 2017, 1711.05382.

[52]  Tin D. Nguyen,et al.  Optimal transport couplings of Gibbs samplers on partitions for unbiased estimation , 2021 .

[53]  Nicholas G. Polson,et al.  Lasso Meets Horseshoe: A Survey , 2017, Statistical Science.

[54]  Trevor Campbell,et al.  Validated Variational Inference via Practical Posterior Error Bounds , 2019, AISTATS.

[55]  C. Villani Optimal Transport: Old and New , 2008 .

[56]  D. Ermak A computer simulation of charged particles in solution. I. Technique and equilibrium properties , 1975 .

[57]  R. Dudley The Speed of Mean Glivenko-Cantelli Convergence , 1969 .

[58]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[59]  A. Eberle Error bounds for Metropolis–Hastings algorithms applied to perturbations of Gaussian measures in high dimensions , 2012, 1210.1180.

[60]  James G. Scott,et al.  Handling Sparsity via the Horseshoe , 2009, AISTATS.

[61]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[62]  J. Heng,et al.  Unbiased Hamiltonian Monte Carlo with couplings , 2017, Biometrika.

[63]  J. Rosenthal,et al.  Optimal scaling of discrete approximations to Langevin diffusions , 1998 .

[64]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[65]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[66]  Christopher Nemeth,et al.  Stochastic Gradient Markov Chain Monte Carlo , 2019, Journal of the American Statistical Association.

[67]  Tor Erlend Fjelde,et al.  Couplings for Multinomial Hamiltonian Monte Carlo , 2021, AISTATS.

[68]  Trevor Campbell,et al.  Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent , 2018, ICML.

[69]  Nawaf Bou-Rabee,et al.  Two-scale coupling for preconditioned Hamiltonian Monte Carlo in infinite dimensions , 2019 .

[70]  Trevor Campbell,et al.  Scalable Gaussian Process Inference with Finite-data Mean and Variance Guarantees , 2018, AISTATS.

[72]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[73]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[74]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[75]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[76]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[77]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[78]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[79]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[80]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[81]  Naveennaidu Narisetty,et al.  Skinny Gibbs: A Consistent and Scalable Gibbs Sampler for Model Selection , 2018, Journal of the American Statistical Association.

[82]  L. Tierney,et al.  Fully Exponential Laplace Approximations to Expectations and Variances of Nonpositive Functions , 1989 .

[83]  H. Thorisson Coupling, stationarity, and regeneration , 2000 .

[84]  N. Pillai,et al.  Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets , 2014, 1405.0182.

[85]  Matti Vihola,et al.  Coupled conditional backward sampling particle filter , 2018, The Annals of Statistics.

[86]  A. Y. Mitrophanov,et al.  Sensitivity and convergence of uniformly ergodic Markov chains , 2005 .

[87]  A. Kennedy,et al.  Acceptances and autocorrelations in hybrid Monte Carlo , 1991 .

[88]  James B. Orlin,et al.  A faster strongly polynomial minimum cost flow algorithm , 1993, STOC '88.

[89]  Michael I. Miller,et al.  REPRESENTATIONS OF KNOWLEDGE IN COMPLEX SYSTEMS , 1994 .

[90]  Arnaud Doucet,et al.  Unbiased Markov chain Monte Carlo for intractable target distributions , 2020, Electronic Journal of Statistics.