Variational inference in nonconjugate models

Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest--like the correlated topic model and Bayesian logistic regression--are nonconjugate. In these models, mean-field methods cannot be directly applied and practitioners have had to develop variational algorithms on a case-by-case basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on real-world data sets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression.

[1]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[2]  S. Shen,et al.  The statistical analysis of compositional data , 1983 .

[3]  L. Brown Fundamentals of statistical exponential families: with applications in statistical decision theory , 1986 .

[4]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[5]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[6]  L. Tierney,et al.  Fully Exponential Laplace Approximations to Expectations and Variances of Nonpositive Functions , 1989 .

[7]  Nicholas G. Polson,et al.  Inference for nonconjugate Bayesian Models using the Gibbs sampler , 1991 .

[8]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[9]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[10]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[11]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[12]  Charles M. Bishop Variational principal components , 1999 .

[13]  B. Mallick,et al.  Generalized Linear Models : A Bayesian Perspective , 2000 .

[14]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[15]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[16]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[17]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[18]  David J. Spiegelhalter,et al.  VIBES: A Variational Inference Engine for Bayesian Networks , 2002, NIPS.

[19]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[20]  Alexander J. Smola,et al.  Laplace Propagation , 2003, NIPS.

[21]  Michael I. Jordan,et al.  A generalized mean field algorithm for variational inference in exponential families , 2002, UAI.

[22]  W. Bruce Croft,et al.  Language Modeling for Information Retrieval , 2010, The Springer International Series on Information Retrieval.

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  Antti Honkela,et al.  Unsupervised Variational Bayesian Learning of Nonlinear Models , 2004, NIPS.

[25]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[26]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[27]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[28]  Joshua D. Clinton,et al.  The Statistical Analysis of Roll Call Data , 2004, American Political Science Review.

[29]  N. L. Johnson,et al.  Continuous Multivariate Distributions: Models and Applications , 2005 .

[30]  E. Xing On Topic Evolution , 2005 .

[31]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[32]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[33]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[34]  David B. Dunson,et al.  The matrix stick-breaking process for flexible multi-task learning , 2007, ICML '07.

[35]  Eric P. Xing,et al.  Seeking The Truly Correlated Topic Posterior - on tight approximate inference of logistic-normal admixture model , 2007, AISTATS.

[36]  Amr Ahmed,et al.  On Tight Approximate Inference of the Logistic-Normal Topic Admixture Model , 2007 .

[37]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[38]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[39]  S. Martino Approximate Bayesian Inference for Latent Gaussian Models , 2007 .

[40]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[41]  Max Welling,et al.  Deterministic Latent Variable Models and Their Pitfalls , 2008, SDM.

[42]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[43]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[44]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[45]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[46]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[47]  Jon D. McAuliffe,et al.  Variational Inference for Large-Scale Models of Discrete Choice , 2007, 0712.2526.

[48]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[49]  J. Fox Bayesian Item Response Modeling: Theory and Applications , 2010 .

[50]  Mohammad Emtiyaz Khan,et al.  Variational bounds for mixed-data factor analysis , 2010, NIPS.

[51]  Tom Minka,et al.  Non-conjugate Variational Message Passing for Multinomial and Binary Regression , 2011, NIPS.

[52]  D. Blei,et al.  The Discrete Innite Logistic Normal Distribution , 2011, 1103.4789.

[53]  Onno Zoeter,et al.  Sparse Bayesian Multi-Task Learning , 2011, NIPS.

[54]  David M. Blei,et al.  Nonparametric variational inference , 2012, ICML.

[55]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[56]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[57]  Michael I. Jordan,et al.  Stick-Breaking Beta Processes and the Poisson Process , 2012, AISTATS.

[58]  Alexes Butler,et al.  Microsoft Research Cambridge , 2013 .

[59]  Peter J. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics, Volume II , 2015 .