Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression

textabstractWe propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the Kullback-Leibler divergence of an approximating distribution to the intractable posterior distribu- tion. Our method can be used to approximate any posterior distribution, provided that it is given in closed form up to the proportionality constant. The approxi- mation can be any distribution in the exponential family or any mixture of such distributions, which means that it can be made arbitrarily precise. Several exam- ples illustrate the speed and accuracy of our approximation method in practice.

[1]  李幼升,et al.  Ph , 1989 .

[2]  Walter R. Gilks,et al.  A Language and Program for Complex Bayesian Modelling , 1994 .

[3]  N. Shephard,et al.  Stochastic Volatility: Likelihood Inference And Comparison With Arch Models , 1996 .

[4]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[5]  Shun-ichi Amari,et al.  Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient , 1996, NIPS.

[6]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[7]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[8]  Amos J. Storkey Dynamic Trees: A Structured Variational Method Giving Efficient Propagation Rules , 2000, UAI.

[9]  Nando de Freitas,et al.  Variational MCMC , 2001, UAI.

[10]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[11]  Siem Jan Koopman,et al.  Time Series Analysis by State Space Methods , 2001 .

[12]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[13]  Joseph Y. Halpern,et al.  Updating Probabilities , 2002, UAI.

[14]  Matthew J. Beal,et al.  The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures , 2003 .

[15]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[16]  J. Geweke,et al.  Contemporary Bayesian Econometrics and Statistics , 2005 .

[17]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[18]  Matthew J. Beal,et al.  Variational Bayesian learning of directed graphical models with hidden variables , 2006 .

[19]  Yee Whye Teh,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2006, NIPS.

[20]  H. Robbins A Stochastic Approximation Method , 1951 .

[21]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[22]  J. Richard,et al.  Efficient high-dimensional importance sampling , 2007 .

[23]  M. Lovell A Simple Proof of the FWL Theorem , 2008 .

[24]  Jim Albert,et al.  Bayesian Computation with R , 2008 .

[25]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[26]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[27]  Jean-François Richard,et al.  Improving MCMC Using Efficient Importance Sampling , 2006, Comput. Stat. Data Anal..

[28]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[29]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[30]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[31]  Thore Graepel,et al.  Matchbox: large scale online bayesian recommendations , 2009, WWW '09.

[32]  M. Wand,et al.  Explaining Variational Approximations , 2010 .

[33]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[34]  Juha Karhunen,et al.  Approximate Riemannian Conjugate Gradient Learning for Fixed-Form Variational Bayes , 2010, J. Mach. Learn. Res..

[35]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[36]  Tom Minka,et al.  Non-conjugate Variational Message Passing for Multinomial and Binary Regression , 2011, NIPS.

[37]  James G. Scott,et al.  On the half-cauchy prior for a global scale parameter , 2011, 1104.4937.

[38]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[39]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[40]  D. Barber,et al.  Bayesian Time Series Models: Inference and estimation in probabilistic time series models , 2011 .

[41]  R. Kohn,et al.  Regression Density Estimation With Variational Methods and Stochastic Approximation , 2012 .

[42]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[43]  Lennart F. Hoogerheide,et al.  A Class of Adaptive Importance Sampling Weighted EM Algorithms for Efficient and Robust Posterior and Predictive Simulation , 2012 .

[44]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[45]  Daphne Koller,et al.  Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (2001) , 2001, ArXiv.