Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood

We consider probabilistic multinomial probit classification using Gaussian process (GP) priors. The challenges with the multiclass GP classification are the integration over the non-Gaussian posterior distribution, and the increase of the number of unknown latent variables as the number of target classes grows. Expectation propagation (EP) has proven to be a very accurate method for approximate inference but the existing EP approaches for the multinomial probit GP classification rely on numerical quadratures or independence assumptions between the latent values from different classes to facilitate the computations. In this paper, we propose a novel nested EP approach which does not require numerical quadratures, and approximates accurately all between-class posterior dependencies of the latent values, but still scales linearly in the number of classes. The predictive accuracy of the nested EP approach is compared to Laplace, variational Bayes, and Markov chain Monte Carlo (MCMC) approximations with various benchmark data sets. In the experiments nested EP was the most consistent method with respect to MCMC sampling, but the differences between the compared methods were small if only the classification accuracy is concerned.

[1]  Kian Ming Adam Chai,et al.  Variational Multinomial Logit Gaussian Process , 2012, J. Mach. Learn. Res..

[2]  Daniel Hernández-Lobato,et al.  Robust Multi-Class Gaussian Process Classification , 2011, NIPS.

[3]  Tom Heskes,et al.  Approximate Marginals in Latent Gaussian Models , 2011, J. Mach. Learn. Res..

[4]  Ole Winther,et al.  PASS-GP: Predictive active set selection for Gaussian processes , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[5]  Tom Heskes,et al.  Bayesian Source Localization with the Multivariate Laplace Prior , 2009, NIPS.

[6]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[7]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[8]  Radford M. Neal Regression and Classification Using Gaussian Process Priors , 2009 .

[9]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[10]  Mingjun Zhong,et al.  Data Integration for Classification Problems Employing Gaussian Process Priors , 2006, NIPS.

[11]  Hyun-Chul Kim,et al.  Bayesian Gaussian Process Classification with the EM-EP Algorithm , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Mark Girolami,et al.  Variational Bayesian Multinomial Probit Regression with Gaussian Process Priors , 2006, Neural Computation.

[13]  Neil D. Lawrence,et al.  Efficient Nonparametric Bayesian Modelling with Sparse Gaussian Process Approximations , 2006 .

[14]  Ole Winther,et al.  Expectation Consistent Approximate Inference , 2005, J. Mach. Learn. Res..

[15]  Carl E. Rasmussen,et al.  Assessing Approximate Inference for Binary Gaussian Process Classification , 2005, J. Mach. Learn. Res..

[16]  Zoubin Ghahramani,et al.  Compact approximations to Bayesian predictive distributions , 2005, ICML.

[17]  M. Seeger Expectation Propagation for Exponential Families , 2005 .

[18]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[19]  Yuan Qi,et al.  Predictive automatic relevance determination by expectation propagation , 2004, ICML.

[20]  Michael I. Jordan,et al.  Sparse Gaussian Process Classification With Multiple Classes , 2004 .

[21]  Alexander J. Smola,et al.  Laplace Propagation , 2003, NIPS.

[22]  Jouko Lampinen,et al.  Bayesian Model Assessment and Comparison Using Cross-Validation Predictive Densities , 2002, Neural Computation.

[23]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[24]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[25]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Radford M. Neal Bayesian learning for neural networks , 1995 .

[27]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[28]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  A. Kennedy,et al.  Hybrid Monte Carlo , 1988 .

[30]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .