Gaussian Kullback-Leibler approximate inference

We investigate Gaussian Kullback-Leibler (G-KL) variational approximate inference techniques for Bayesian generalised linear models and various extensions. In particular we make the following novel contributions: sufficient conditions for which the G-KL objective is differentiable and convex are described; constrained parameterisations of Gaussian covariance that make G-KL methods fast and scalable are provided; the lower bound to the normalisation constant provided by G-KL methods is proven to dominate those provided by local lower bounding methods; complexity and model applicability issues of G-KL versus other Gaussian approximate inference methods are discussed. Numerical results comparing G-KL and other deterministic Gaussian approximate inference methods are presented for: robust Gaussian process regression models with either Student-t or Laplace likelihoods, large scale Bayesian binary logistic regression models, and Bayesian sparse linear models for sequential experimental design.

[1]  M. Seeger Sparse linear models: Variational approximate inference and Bayesian experimental design , 2009 .

[2]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[3]  Mark A. Girolami,et al.  A Variational Method for Learning Sparse and Overcomplete Representations , 2001, Neural Computation.

[4]  Charles M. Bishop,et al.  Ensemble learning in Bayesian neural networks , 1998 .

[5]  Bhaskar D. Rao,et al.  Sparse Bayesian learning for basis selection , 2004, IEEE Transactions on Signal Processing.

[6]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[7]  Aki Vehtari,et al.  Robust Gaussian Process Regression with a Student-t Likelihood , 2011, J. Mach. Learn. Res..

[8]  Bhaskar D. Rao,et al.  Variational EM Algorithms for Non-Gaussian Latent Variable Models , 2005, NIPS.

[9]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[10]  Matthias W. Seeger,et al.  Convex variational Bayesian inference for large scale generalized linear models , 2009, ICML '09.

[11]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[12]  David Barber,et al.  Concave Gaussian Variational Approximations for Inference in Large-Scale Bayesian Linear Models , 2011, AISTATS.

[13]  Matthias W. Seeger,et al.  Large Scale Bayesian Inference and Experimental Design for Sparse Linear Models , 2011, SIAM J. Imaging Sci..

[14]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[15]  C. J.,et al.  Generalization Error and the Number of Hidden units in a MultilayerPerceptronDavid , 1995 .

[16]  M. Seeger Low Rank Updates for the Cholesky Decomposition , 2004 .

[17]  Matthias W. Seeger,et al.  Compressed sensing and Bayesian experimental design , 2008, ICML '08.

[18]  Carl E. Rasmussen,et al.  Assessing Approximate Inference for Binary Gaussian Process Classification , 2005, J. Mach. Learn. Res..

[19]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[20]  M. Wand,et al.  Gaussian Variational Approximate Inference for Generalized Linear Mixed Models , 2012 .

[21]  Juha Karhunen,et al.  Approximate Riemannian Conjugate Gradient Learning for Fixed-Form Variational Bayes , 2010, J. Mach. Learn. Res..

[22]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[23]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[24]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[25]  M. Seeger Bayesian methods for Support Vector machines and Gaussian processes , 1999 .

[26]  Ole Winther,et al.  Expectation Consistent Approximate Inference , 2005, J. Mach. Learn. Res..

[27]  Matthias W. Seeger,et al.  Gaussian Covariance and Scalable Variational Inference , 2010, ICML.

[28]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[29]  Matthias Bethge,et al.  Bayesian Inference for Sparse Generalized Linear Models , 2007, ECML.

[30]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[31]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[32]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[33]  Hannes Nickisch,et al.  Bayesian inference and experimental design for large generalised linear models , 2010 .

[34]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[35]  Matthias W. Seeger,et al.  Large Scale Variational Bayesian Inference for Structured Scale Mixture Models , 2012, ICML.

[36]  Mohammad Emtiyaz Khan,et al.  Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models , 2011, ICML.

[37]  Matthias W. Seeger,et al.  Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers , 1999, NIPS.

[38]  Aki Vehtari,et al.  Gaussian process regression with Student-t likelihood , 2009, NIPS.

[39]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[40]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[41]  Michael E. Tipping Probabilistic Visualisation of High-Dimensional Binary Data , 1998, NIPS.

[42]  Sundaresh Ram,et al.  Removing Camera Shake from a Single Photograph , 2009 .

[43]  Matthias W. Seeger,et al.  Large Scale Variational Inference and Experimental Design for Sparse Generalized Linear Models , 2008, Sampling-based Optimization in the Presence of Uncertainty.

[44]  Michael I. Jordan,et al.  A Variational Approach to Bayesian Logistic Regression Models and their Extensions , 1997, AISTATS.

[45]  David J. C. MacKay,et al.  Variational Gaussian process classifiers , 2000, IEEE Trans. Neural Networks Learn. Syst..

[46]  Hannes Nickisch glm-ie: Generalised Linear Models Inference & Estimation Toolbox , 2012, J. Mach. Learn. Res..

[47]  George Papandreou,et al.  Gaussian sampling by local perturbations , 2010, NIPS.

[48]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[49]  R. Herbrich On Gaussian Expectation Propagation , 2005 .

[50]  D. Field,et al.  Natural image statistics and efficient coding. , 1996, Network.

[51]  Malte Kuß,et al.  Gaussian process models for robust regression, classification, and reinforcement learning , 2006 .