Yes, but Did It Work?: Evaluating Variational Inference

While it's always possible to compute a variational approximation to a posterior distribution, it can be difficult to discover problems with this approximation". We propose two diagnostic algorithms to alleviate this problem. The Pareto-smoothed importance sampling (PSIS) diagnostic gives a goodness of fit measurement for joint distributions, while simultaneously improving the error in the estimate. The variational simulation-based calibration (VSBC) assesses the average performance of point estimates.

[1]  Robert L. SielkenJr. Stopping times for stochastic approximation procedures , 1973 .

[2]  R. Sielken Stopping times for stochastic approximation procedures , 1973 .

[3]  H. Braun,et al.  On a new stopping rule for stochastic approximation , 1982 .

[4]  J. Geweke,et al.  Bayesian Inference in Econometric Models Using Monte Carlo Integration , 1989 .

[5]  G. Pflug Non-asymptotic confidence bounds for stochastic approximation algorithms with constant step size , 1990 .

[6]  Jeffrey L. Anderson A Method for Producing and Evaluating Probabilistic Forecasts from Ensemble Model Integrations , 1996 .

[7]  Louis H. Y. Chen,et al.  Normal approximation under local dependence , 2004, math/0410104.

[8]  Donald B. Rubin,et al.  Validation of Software for Bayesian Models Using Posterior Quantiles , 2006 .

[9]  E. Ionides Truncated Importance Sampling , 2008 .

[10]  S. MacEachern,et al.  Case-deletion importance sampling estimators: Central limit theorems and related results , 2008, 0807.0725.

[11]  Drew D. Creal,et al.  Testing the assumptions behind importance sampling , 2009 .

[12]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[13]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[14]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[15]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[16]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[17]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[18]  G. Crooks On Measures of Entropy and Information , 2015 .

[19]  Yasumasa Fujisaki,et al.  A stopping rule for stochastic approximation , 2015, Autom..

[20]  A. Gelman,et al.  Pareto Smoothed Importance Sampling , 2015, 1507.02646.

[21]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[22]  Richard E. Turner,et al.  Rényi Divergence Variational Inference , 2016, NIPS.

[23]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[24]  Aki Vehtari,et al.  Sparsity information and regularization in the horseshoe and other shrinkage priors , 2017, 1707.01694.

[25]  Panos Toulis,et al.  Convergence diagnostics for stochastic gradient descent with constant step size , 2017, ArXiv.

[26]  Dustin Tran,et al.  Variational Inference via \chi Upper Bound Minimization , 2016, NIPS.

[27]  Mehryar Mohri,et al.  Relative deviation learning bounds and generalization with unbounded loss functions , 2013, Annals of Mathematics and Artificial Intelligence.

[28]  Michael I. Jordan,et al.  Covariances, Robustness, and Variational Bayes , 2017, J. Mach. Learn. Res..