Validated Variational Inference via Practical Posterior Error Bounds

Variational inference has become an increasingly attractive fast alternative to Markov chain Monte Carlo methods for approximate Bayesian inference. However, a major obstacle to the widespread use of variational methods is the lack of post-hoc accuracy measures that are both theoretically justified and computationally efficient. In this paper, we provide rigorous bounds on the error of posterior mean and uncertainty estimates that arise from full-distribution approximations, as in variational inference. Our bounds are widely applicable, as they require only that the approximating and exact posteriors have polynomial moments. Our bounds are also computationally efficient for variational inference because they require only standard values from variational objectives, straightforward analytic calculations, and simple Monte Carlo estimates. We show that our analysis naturally leads to a new and improved workflow for validated variational inference. Finally, we demonstrate the utility of our proposed workflow and error bounds on a robust regression problem and on a real-data example with a widely used multilevel hierarchical model.

[1]  Xiangyu Wang,et al.  Boosting Variational Inference , 2016, ArXiv.

[2]  Dustin Tran,et al.  Variational Inference via \chi Upper Bound Minimization , 2016, NIPS.

[3]  Andrew Gelman,et al.  Automatic Variational Inference in Stan , 2015, NIPS.

[4]  Neal Madras,et al.  Quantitative bounds for Markov chain convergence: Wasserstein and total variation distances , 2010, 1102.5245.

[5]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[6]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[7]  Aki Vehtari,et al.  Yes, but Did It Work?: Evaluating Variational Inference , 2018, ICML.

[8]  Ohad Shamir,et al.  Global Non-convex Optimization with Discretized Diffusions , 2018, NeurIPS.

[9]  Pascal Fua,et al.  Multi-modal Mean-Fields via Cardinality-Based Clamping , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  C. Villani Topics in Optimal Transportation , 2003 .

[11]  A. Guillin,et al.  Transportation cost-information inequalities and applications to random dynamical systems and diffusions , 2004, math/0410172.

[12]  C. Villani Optimal Transport: Old and New , 2008 .

[13]  Lester Mackey,et al.  Random Feature Stein Discrepancies , 2018, NeurIPS.

[14]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[15]  Yun Yang,et al.  On Statistical Optimality of Variational Bayes , 2018, AISTATS.

[16]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[17]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[18]  Pierre Alquier,et al.  Consistency of variational Bayes inference for estimation and model selection in mixtures , 2018, 1805.05054.

[19]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[20]  Justin Solomon,et al.  Stochastic Wasserstein Barycenters , 2018, ICML.

[21]  Ryan P. Adams,et al.  Variational Boosting: Iteratively Refining Posterior Approximations , 2016, ICML.

[22]  David B. Dunson,et al.  Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[23]  Richard E. Turner,et al.  A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation , 2016, J. Mach. Learn. Res..

[24]  Christian Bauckhage,et al.  Computing the Kullback-Leibler Divergence between two Weibull Distributions , 2013, ArXiv.

[25]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[26]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[27]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[28]  Yixin Wang,et al.  Variational Bayes under Model Misspecification , 2019, NeurIPS.

[29]  D. Rudolf,et al.  Perturbation theory for Markov chains via Wasserstein distance , 2015, Bernoulli.

[30]  Van Der Vaart,et al.  The Bernstein-Von-Mises theorem under misspecification , 2012 .

[31]  Trevor Campbell,et al.  Universal Boosting Variational Inference , 2019, NeurIPS.

[32]  Gunnar Rätsch,et al.  Boosting Variational Inference: an Optimization Perspective , 2017, AISTATS.

[33]  N. Gozlan A characterization of dimension free concentration in terms of transportation inequalities , 2008, 0804.3089.

[34]  Pierre Alquier,et al.  On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[35]  Daniel Hernández-Lobato,et al.  Black-Box Alpha Divergence Minimization , 2015, ICML.

[36]  A. Gelman,et al.  Pareto Smoothed Importance Sampling , 2015, 1507.02646.

[37]  S. Haneuse,et al.  On the Assessment of Monte Carlo Error in Simulation-Based Statistical Analyses , 2009, The American statistician.

[38]  S. Bobkov,et al.  Exponential Integrability and Transportation Cost Related to Logarithmic Sobolev Inequalities , 1999 .

[39]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[40]  Richard E. Turner,et al.  Rényi Divergence Variational Inference , 2016, NIPS.

[41]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[42]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[43]  C. Villani,et al.  Weighted Csiszár-Kullback-Pinsker inequalities and applications to transportation inequalities , 2005 .

[44]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[45]  Christian P. Robert,et al.  The Bayesian choice , 1994 .

[46]  David M. Blei,et al.  Frequentist Consistency of Variational Bayes , 2017, Journal of the American Statistical Association.

[47]  P. Diaconis,et al.  The sample size required in importance sampling , 2015, 1511.01437.

[48]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[49]  Y. Ollivier,et al.  CURVATURE, CONCENTRATION AND ERROR ESTIMATES FOR MARKOV CHAIN MONTE CARLO , 2009, 0904.1312.

[50]  Alain Durmus,et al.  Analysis of Langevin Monte Carlo via Convex Optimization , 2018, J. Mach. Learn. Res..

[51]  K. Zygalakis,et al.  (Non-) asymptotic properties of Stochastic Gradient Langevin Dynamics , 2015, 1501.00438.

[52]  Mateusz B. Majka,et al.  Quantitative contraction rates for Markov chains on general state spaces , 2018, Electronic Journal of Probability.

[53]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[54]  Pierre Alquier,et al.  Concentration of tempered posteriors and of their variational approximations , 2017, The Annals of Statistics.

[55]  Gunnar Rätsch,et al.  Boosting Black Box Variational Inference , 2018, NeurIPS.

[56]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[57]  Adji B. Dieng,et al.  Variational Inference via χ Upper Bound Minimization , 2017 .

[58]  Trevor Campbell,et al.  Practical Posterior Error Bounds from Variational Objectives , 2019, ArXiv.