Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression

We present a new variational inference algorithm for Gaussian process regression with non-conjugate likelihood functions, with application to a wide array of problems including binary and multi-class classification, and ordinal regression. Our method constructs a concave lower bound that is optimized using an efficient fixed-point updating algorithm. We show that the new algorithm has highly competitive computational complexity, matching that of alternative approximate inference methods. We also prove that the use of concave variational bounds provides stable and guaranteed convergence - a property not available to other approaches. We show empirically for both binary and multi-class classification that our new algorithm converges much faster than existing variational methods, and without any degradation in performance.

[1]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[2]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1995 .

[3]  Michael I. Jordan,et al.  A variational approach to Bayesian logistic regression problems and their extensions , 1996 .

[4]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[5]  Carl E. Rasmussen,et al.  Assessing Approximate Inference for Binary Gaussian Process Classification , 2005, J. Mach. Learn. Res..

[6]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[7]  Mark Girolami,et al.  Variational Bayesian Multinomial Probit Regression with Gaussian Process Priors , 2006, Neural Computation.

[8]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[9]  Guillaume Bouchard Efficient Bounds for the Softmax Function and Applications to Approximate Inference in Hybrid models , 2008 .

[10]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[11]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[12]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[13]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[14]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[15]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[16]  Jon D. McAuliffe,et al.  Variational Inference for Large-Scale Models of Discrete Choice , 2007, 0712.2526.

[17]  S. Frühwirth-Schnatter,et al.  Data Augmentation and MCMC for Binary and Multinomial Logit Models , 2010 .

[18]  David Barber,et al.  Concave Gaussian Variational Approximations for Inference in Large-Scale Bayesian Linear Models , 2011, AISTATS.

[19]  Mohammad Emtiyaz Khan,et al.  Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models , 2011, ICML.

[20]  Aki Vehtari,et al.  Robust Gaussian Process Regression with a Student-t Likelihood , 2011, J. Mach. Learn. Res..

[21]  Matthias W. Seeger,et al.  Fast Convergent Algorithms for Expectation Propagation Approximate Bayesian Inference , 2010, AISTATS.

[22]  S. L. Scott Data augmentation, frequentist estimation, and the Bayesian analysis of multinomial logit models , 2011 .

[23]  Mohammad Emtiyaz Khan,et al.  A Stick-Breaking Likelihood for Categorical Data Analysis with Latent Gaussian Models , 2012, AISTATS.