Generalized Variational Inference

This paper introduces a generalized representation of Bayesian inference. It is derived axiomatically, recovering existing Bayesian methods as special cases. We then use it to prove that variational inference (VI) based on the Kullback-Leibler Divergence with a variational family Q produces the uniquely optimal Q-constrained approximation to the exact Bayesian inference problem. Surprisingly, this implies that standard VI dominates any other Q-constrained approximation to the exact Bayesian inference problem. This means that alternative Q-constrained approximations such as VI minimizing other divergences and Expectation Propagation can produce better posteriors than VI only by implicitly targeting more appropriate Bayesian inference problems. Inspired by this, we introduce Generalized Variational Inference (GVI), a modular approach for instead solving such alternative inference problems explicitly. We explore some applications of GVI, including robustness and better marginals. Lastly, we derive black box GVI and apply it to Bayesian Neural Networks and Deep Gaussian Processes, where GVI can comprehensively outperform competing methods.

[1]  Pierre Alquier,et al.  On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[2]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[3]  I. Vajda,et al.  Convex Statistical Distances , 2018, Statistical Inference for Engineers and Data Scientists.

[4]  Ole Winther,et al.  Gaussian Processes for Classification: Mean-Field Algorithms , 2000, Neural Computation.

[5]  A. Rényi On Measures of Entropy and Information , 1961 .

[6]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[7]  Yue Yang,et al.  Variational approximations using Fisher divergence , 2019, ArXiv.

[8]  Motoaki Kawanabe,et al.  Robust Spatial Filtering with Beta Divergence , 2013, NIPS.

[9]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[10]  Sebastian Kurtek,et al.  Bayesian sensitivity analysis with the Fisher–Rao metric , 2015 .

[11]  Manuel Gil,et al.  On Rényi Divergence Measures for Continuous Alphabet Sources , 2011 .

[12]  Dustin Tran,et al.  Variational Inference via \chi Upper Bound Minimization , 2016, NIPS.

[13]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[14]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Dustin Tran,et al.  Operator Variational Inference , 2016, NIPS.

[17]  Michael I. Jordan,et al.  Covariances, Robustness, and Variational Bayes , 2017, J. Mach. Learn. Res..

[18]  Ricardo Silva,et al.  Alpha-Beta Divergence For Variational Inference , 2018, ArXiv.

[19]  Manfred Opper,et al.  Perturbative Black Box Variational Inference , 2017, NIPS.

[20]  Adji B. Dieng,et al.  Variational Inference via χ Upper Bound Minimization , 2017 .

[21]  Max Welling,et al.  Sylvester Normalizing Flows for Variational Inference , 2018, UAI.

[22]  Alexandre Lacoste,et al.  Improving Explorability in Variational Inference with Annealed Variational Objectives , 2018, NeurIPS.

[23]  Fady Alajaji,et al.  Rényi divergence measures for commonly used univariate continuous distributions , 2013, Inf. Sci..

[24]  Marc Peter Deisenroth,et al.  Doubly Stochastic Variational Inference for Deep Gaussian Processes , 2017, NIPS.

[25]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[26]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[27]  Theodoros Damoulas,et al.  Doubly Robust Bayesian Inference for Non-Stationary Streaming Data with β-Divergences , 2018, NeurIPS.

[28]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[29]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[30]  Masashi Sugiyama,et al.  Variational Inference based on Robust Divergences , 2017, AISTATS.

[31]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[32]  A. Basu,et al.  Robust Bayes estimation using the density power divergence , 2016 .

[33]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[34]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[35]  Hao Liu,et al.  Variational Inference with Tail-adaptive f-Divergence , 2018, NeurIPS.

[36]  Pier Giovanni Bissiri,et al.  A general framework for updating belief distributions , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[37]  Richard E. Turner,et al.  Black-box α-divergence minimization , 2016, ICML 2016.

[38]  Ning Chen,et al.  Bayesian inference with posterior regularization and applications to infinite latent SVMs , 2012, J. Mach. Learn. Res..

[39]  Mihoko Minami,et al.  Robust Blind Source Separation by Beta Divergence , 2002, Neural Computation.

[40]  Thijs van Ommen,et al.  Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It , 2014, 1412.3730.

[41]  Byron Boots,et al.  Orthogonally Decoupled Variational Gaussian Processes , 2018, NeurIPS.

[42]  Mike Wu,et al.  Differentiable Antithetic Sampling for Variance Reduction in Stochastic Variational Inference , 2019, AISTATS.

[43]  C. Holmes,et al.  Approximate Models and Robust Decisions , 2014, 1402.6118.

[44]  James O. Berger,et al.  An overview of robust Bayesian analysis , 1994 .

[45]  Chris Holmes,et al.  General Bayesian updating and the loss-likelihood bootstrap , 2017, Biometrika.

[46]  Luca Ambrogioni,et al.  Wasserstein Variational Inference , 2018, NeurIPS.

[47]  C. Holmes,et al.  Assigning a value to a power likelihood in a general Bayesian model , 2017, 1701.08515.

[48]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[49]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[50]  David M. Blei,et al.  Reweighted Data for Robust Probabilistic Models , 2016, ArXiv.

[51]  Giles Hooker,et al.  Bayesian model robustness via disparities , 2011, 1112.4213.

[52]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[53]  Sebastian Kurtek,et al.  A Geometric Variational Approach to Bayesian Inference , 2017, Journal of the American Statistical Association.

[54]  Richard E. Turner,et al.  Rényi Divergence Variational Inference , 2016, NIPS.

[55]  Jim Q. Smith,et al.  Principles of Bayesian Inference Using General Divergence Criteria , 2018, Entropy.

[56]  Shintaro Hashimoto,et al.  Robust Bayesian inference via γ-divergence , 2020, Communications in Statistics - Theory and Methods.

[57]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[58]  Daniel Hernández-Lobato,et al.  Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[59]  David A. Knowles,et al.  On Using Control Variates with Stochastic Approximation for Variational Bayes and its Connection to Stochastic Linear Regression , 2014, 1401.1022.

[60]  A. Zellner Optimal Information Processing and Bayes's Theorem , 1988 .

[61]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[62]  David B. Dunson,et al.  Robust Bayesian Inference via Coarsening , 2015, Journal of the American Statistical Association.

[63]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[64]  Su-Yun Huang,et al.  Robust mislabel logistic regression without modeling mislabel probabilities , 2016, Biometrics.

[65]  Edwin V. Bonilla,et al.  Generic Inference in Latent Gaussian Process Models , 2016, J. Mach. Learn. Res..

[66]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[67]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[68]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[69]  M. C. Jones,et al.  Robust and efficient estimation by minimising a density power divergence , 1998 .

[70]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[71]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[72]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[73]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[74]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[75]  Debdeep Pati,et al.  $\alpha $-variational inference with statistical guarantees , 2017, The Annals of Statistics.

[76]  S. Eguchi A differential geometric approach to statistical inference on the basis of contrast functionals , 1985 .

[77]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[78]  S. Eguchi,et al.  Robust parameter estimation with a small bias against heavy contamination , 2008 .

[79]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[80]  A. Dawid The geometry of proper scoring rules , 2007 .

[81]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.