论文信息 - Bounding Wasserstein distance with couplings

Bounding Wasserstein distance with couplings

Markov chain Monte Carlo (MCMC) provides asymptotically consistent estimates of intractable posterior expectations as the number of iterations tends to infinity. However, in large data applications, MCMC can be computationally expensive per iteration. This has catalyzed interest in sampling methods such as approximate MCMC, which trade off asymptotic consistency for improved computational speed. In this article, we propose estimators based on couplings of Markov chains to assess the quality of such asymptotically biased sampling methods. The estimators give empirical upper bounds of the Wassertein distance between the limiting distribution of the asymptotically biased sampling method and the original target distribution of interest. We establish theoretical guarantees for our upper bounds and show that our estimators can remain effective in high dimensions. We apply our quality measures to stochastic gradient MCMC, variational Bayes, and Laplace approximations for tall data and to approximate MCMC for Bayesian logistic regression in 4500 dimensions and Bayesian linear regression in 50000 dimensions.

Lester Mackey | Niloy Biswas | Lester W. Mackey | N. Biswas

[1] R. Tweedie,et al. Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[2] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.

[3] Jason Altschuler,et al. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[4] P. Jacob,et al. Maximal Couplings of the Metropolis-Hastings Algorithm , 2020, AISTATS.

[5] E. George,et al. Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[6] Dootika Vats,et al. Revisiting the Gelman–Rubin Diagnostic , 2018, Statistical Science.

[7] M. Gelbrich. On a Formula for the L2 Wasserstein Metric between Measures on Euclidean and Hilbert Spaces , 1990 .

[8] A. Eberle,et al. Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.

[9] Lester W. Mackey,et al. Stochastic Stein Discrepancies , 2020, NeurIPS.

[10] Arnaud Doucet,et al. On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[11] M. West,et al. Shotgun Stochastic Search for “Large p” Regression , 2007 .

[12] Peter Harremoës,et al. Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[13] L. Rogers,et al. Coupling of Multidimensional Diffusions by Reflection , 1986 .

[14] D. Rudolf,et al. Perturbation theory for Markov chains via Wasserstein distance , 2015, Bernoulli.

[15] Yao Li,et al. Using coupling methods to estimate sample quality for stochastic differential equations , 2019, ArXiv.

[16] Arthur Gretton,et al. A Kernel Test of Goodness of Fit , 2016, ICML.

[17] Andrew Gelman,et al. Handbook of Markov Chain Monte Carlo , 2011 .

[18] F. Bach,et al. Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , 2017, Bernoulli.

[19] J. S. Rao,et al. Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[20] Yves Atchad'e,et al. A fast asynchronous MCMC sampler for sparse Bayesian inference , 2021, 2108.06446.

[21] V. Johnson. A Coupling-Regeneration Scheme for Diagnosing Convergence in Markov Chain Monte Carlo Algorithms , 1998 .

[22] Alain Durmus,et al. Discrete sticky couplings of functional autoregressive processes , 2021 .

[23] B. Mallick,et al. Fast sampling with Gaussian scale-mixture priors in high-dimensional regression. , 2015, Biometrika.

[24] Lester Mackey,et al. Random Feature Stein Discrepancies , 2018, NeurIPS.

[25] James Ridgway,et al. Leave Pima Indians alone: binary regression as a benchmark for Bayesian computation , 2015, 1506.08640.

[26] Andrew Gelman,et al. General methods for monitoring convergence of iterative simulations , 1998 .

[27] Arnaud Doucet,et al. Unbiased Smoothing using Particle Independent Metropolis-Hastings , 2019, AISTATS.

[28] Pierre E. Jacob,et al. Estimating Convergence of Markov chains with L-Lag Couplings , 2019, NeurIPS.

[29] Andrew W. Moore,et al. Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs , 2003, AISTATS.

[30] A. Eberle. Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[31] E. Vanden-Eijnden,et al. Non-asymptotic mixing of the MALA algorithm , 2010, 1008.3514.

[32] Mateusz B. Majka,et al. Quantitative contraction rates for Markov chains on general state spaces , 2018, Electronic Journal of Probability.

[33] P. Jacob,et al. Unbiased Markov chain Monte Carlo methods with couplings , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[34] Trevor Campbell,et al. Coresets for Scalable Bayesian Logistic Regression , 2016, NIPS.

[35] Aki Vehtari,et al. Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (with Discussion) , 2019, Bayesian Analysis.

[36] Alexander Gasnikov,et al. Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm , 2018, ICML.

[37] A. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[38] Anirban Bhattacharya,et al. Scalable Approximate MCMC Algorithms for the Horseshoe Prior , 2020, J. Mach. Learn. Res..

[39] N. Narisetty,et al. Bayesian variable selection with shrinking and diffusing priors , 2014, 1405.6545.

[40] Lester W. Mackey,et al. Measuring Sample Quality with Kernels , 2017, ICML.

[41] Richard S. Johannes,et al. Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[42] F. Liang,et al. Bayesian Subset Modeling for High-Dimensional Generalized Linear Models , 2013 .

[43] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[44] Ravindra K. Ahuja,et al. Network Flows: Theory, Algorithms, and Applications , 1993 .

[45] Jing Lei. Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces , 2018, Bernoulli.