Practical bounds on the error of Bayesian posterior approximations: A nonasymptotic approach

Bayesian inference typically requires the computation of an approximation to the posterior distribution. An important requirement for an approximate Bayesian inference algorithm is to output high-accuracy posterior mean and uncertainty estimates. Classical Monte Carlo methods, particularly Markov Chain Monte Carlo, remain the gold standard for approximate Bayesian inference because they have a robust finite-sample theory and reliable convergence diagnostics. However, alternative methods, which are more scalable or apply to problems where Markov Chain Monte Carlo cannot be used, lack the same finite-data approximation theory and tools for evaluating their accuracy. In this work, we develop a flexible new approach to bounding the error of mean and uncertainty estimates of scalable inference algorithms. Our strategy is to control the estimation errors in terms of Wasserstein distance, then bound the Wasserstein distance via a generalized notion of Fisher distance. Unlike computing the Wasserstein distance, which requires access to the normalized posterior distribution, the Fisher distance is tractable to compute because it requires access only to the gradient of the log posterior density. We demonstrate the usefulness of our Fisher distance approach by deriving bounds on the Wasserstein error of the Laplace approximation and Hilbert coresets. We anticipate that our approach will be applicable to many other approximate inference methods such as the integrated Laplace approximation, variational inference, and approximate Bayesian computation

[1]  Pierre Alquier,et al.  Consistency of variational Bayes inference for estimation and model selection in mixtures , 2018, 1805.05054.

[2]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[3]  Helen Ogden,et al.  On the error in Laplace approximations of high‐dimensional integrals , 2018, Stat.

[4]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[5]  Pierre Alquier,et al.  Concentration of tempered posteriors and of their variational approximations , 2017, The Annals of Statistics.

[6]  A. Barron,et al.  Fisher information inequalities and the central limit theorem , 2001, math/0111020.

[7]  Michael I. Jordan,et al.  Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[8]  David B. Dunson,et al.  Robust and Scalable Bayes via a Median of Subset Posterior Measures , 2014, J. Mach. Learn. Res..

[9]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[10]  A. Eberle Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[11]  Ryan P. Adams,et al.  PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference , 2017, NIPS.

[12]  Y. Ollivier,et al.  CURVATURE, CONCENTRATION AND ERROR ESTIMATES FOR MARKOV CHAIN MONTE CARLO , 2009, 0904.1312.

[13]  L. Tierney,et al.  The validity of posterior expansions based on Laplace''s method , 1990 .

[14]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[15]  Volkan Cevher,et al.  WASP: Scalable Bayes via barycenters of subset posteriors , 2015, AISTATS.

[16]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[17]  Aapo Hyvärinen,et al.  Density Estimation in Infinite Dimensional Exponential Families , 2013, J. Mach. Learn. Res..

[18]  A. Stuart,et al.  Spectral gaps for a Metropolis–Hastings algorithm in infinite dimensions , 2011, 1112.1392.

[19]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[20]  D. Dunson,et al.  Simple, scalable and accurate posterior interval estimation , 2016, 1605.04029.

[21]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[22]  Oren Mangoubi,et al.  Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions , 2017, 1708.07114.

[23]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[24]  B. Øksendal Stochastic differential equations : an introduction with applications , 1987 .

[25]  Pierre Alquier,et al.  On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[26]  Helen Ogden,et al.  On asymptotic validity of naive inference with an approximate likelihood , 2016, 1601.07911.

[27]  Yvik Swan,et al.  Stein’s density approach and information inequalities , 2012, 1210.3921.

[28]  Christian P. Robert,et al.  The Bayesian choice , 1994 .

[29]  S. Shreve,et al.  Stochastic differential equations , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.

[30]  Andreas Krause,et al.  Scalable Training of Mixture Models via Coresets , 2011, NIPS.

[31]  O. Johnson Information Theory And The Central Limit Theorem , 2004 .

[32]  Andreas Krause,et al.  Training Mixture Models at Scale via Coresets , 2017 .

[33]  James Zou,et al.  Quantifying the accuracy of approximate diffusions and Markov chains , 2016, AISTATS.

[34]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Michael I. Jordan,et al.  Variational Consensus Monte Carlo , 2015, NIPS.

[36]  S. Glotzer,et al.  Time-course gait analysis of hemiparkinsonian rats following 6-hydroxydopamine lesion , 2004, Behavioural Brain Research.

[37]  Mu-Fa Chen Eigenvalues, inequalities and ergodic theory , 2000 .

[38]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[39]  É. Moulines,et al.  On the convergence of Hamiltonian Monte Carlo , 2017, 1705.00166.

[40]  Trevor Campbell,et al.  Coresets for Scalable Bayesian Logistic Regression , 2016, NIPS.

[41]  L. Tierney,et al.  Fully Exponential Laplace Approximations to Expectations and Variances of Nonpositive Functions , 1989 .

[42]  Trevor Campbell,et al.  Automated Scalable Bayesian Inference via Hilbert Coresets , 2017, J. Mach. Learn. Res..

[43]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[44]  Arnaud Doucet,et al.  Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[45]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[46]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[47]  Pierre Alquier,et al.  Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels , 2014, Statistics and Computing.

[48]  Qi-Man Shao,et al.  A Malliavin-Stein approach for multivariate approximations in Wasserstein distance , 2018 .

[49]  D. Rudolf,et al.  Perturbation theory for Markov chains via Wasserstein distance , 2015, Bernoulli.

[50]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[51]  Christian Posse,et al.  Likelihood-Based Data Squashing: A Modeling Approach to Instance Construction , 2002, Data Mining and Knowledge Discovery.

[52]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[53]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[54]  Trevor Campbell,et al.  Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent , 2018, ICML.

[55]  Yun Yang,et al.  On Statistical Optimality of Variational Bayes , 2018, AISTATS.

[56]  Andrew Gelman,et al.  Automatic Variational Inference in Stan , 2015, NIPS.

[57]  Neal Madras,et al.  Quantitative bounds for Markov chain convergence: Wasserstein and total variation distances , 2010, 1102.5245.

[58]  G. Peccati,et al.  Normal Approximations with Malliavin Calculus: From Stein's Method to Universality , 2012 .

[59]  Fabrizio Leisen,et al.  An Approximate Likelihood Perspective on ABC Methods , 2017, 1708.05341.

[60]  Haavard Rue,et al.  Bayesian Computing with INLA: A Review , 2016, 1604.00860.

[61]  David M. Blei,et al.  Frequentist Consistency of Variational Bayes , 2017, Journal of the American Statistical Association.

[62]  Andrew M. Stuart,et al.  Inverse problems: A Bayesian perspective , 2010, Acta Numerica.

[63]  K. Zygalakis,et al.  (Non-) asymptotic properties of Stochastic Gradient Langevin Dynamics , 2015, 1501.00438.

[64]  M. Schervish Theory of Statistics , 1995 .

[65]  A. Eberle,et al.  Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.