Robust Gaussian Process Regression with a Student-t Likelihood

This paper considers the robust and efficient implementation of Gaussian process regression with a Student-t observation model, which has a non-log-concave likelihood. The challenge with the Student-t model is the analytically intractable inference which is why several approximative methods have been proposed. Expectation propagation (EP) has been found to be a very accurate method in many empirical studies but the convergence of EP is known to be problematic with models containing non-log-concave site functions. In this paper we illustrate the situations where standard EP fails to converge and review different modifications and alternative algorithms for improving the convergence. We demonstrate that convergence problems may occur during the type-II maximum a posteriori (MAP) estimation of the hyperparameters and show that standard EP may not converge in the MAP values with some difficult data sets. We present a robust implementation which relies primarily on parallel EP updates and uses a moment-matching-based double-loop algorithm with adaptively selected step size in difficult cases. The predictive performance of EP is compared with Laplace, variational Bayes, and Markov chain Monte Carlo approximations.

[1]  Jouko Lampinen,et al.  Bayesian Model Assessment and Comparison Using Cross-Validation Predictive Densities , 2002, Neural Computation.

[2]  Neil D. Lawrence,et al.  A variational approach to robust Bayesian interpolation , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).

[3]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[4]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[5]  Neil D. Lawrence,et al.  Variational inference for Student-t models: Robust Bayesian interpolation and generalised component analysis , 2005, Neurocomputing.

[6]  W. Wiegerinck,et al.  Approximate inference techniques with expectation constraints , 2005 .

[7]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[8]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[9]  M. West Outlier Models and Prior Distributions in Bayesian Linear Regression , 1984 .

[10]  Oliver Stegle,et al.  Gaussian Process Robust Regression for Noisy Heart Rate Data , 2008, IEEE Transactions on Biomedical Engineering.

[11]  Aki Vehtari,et al.  Gaussian process regression with Student-t likelihood , 2009, NIPS.

[12]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[13]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[14]  Paul W. Goldberg,et al.  Regression with Input-dependent Noise: A Gaussian Process Treatment , 1997, NIPS.

[15]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[16]  I-Cheng Yeh,et al.  Modeling of strength of high-performance concrete using artificial neural networks , 1998 .

[17]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[18]  Sean B. Holden,et al.  Robust Regression with Twinned Gaussian Processes , 2007, NIPS.

[19]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[20]  Tom Heskes,et al.  Regulator Discovery from Gene Expression Time Series of Malaria Parasites: a Hierachical Approach , 2007, NIPS.

[21]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[22]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[23]  Tom Heskes,et al.  Bayesian Source Localization with the Multivariate Laplace Prior , 2009, NIPS.

[24]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[25]  A. Dawid Posterior expectations for large observations , 1973 .

[26]  J. Friedman Multivariate adaptive regression splines , 1990 .

[27]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[29]  J. Geweke,et al.  Bayesian Treatment of the Independent Student- t Linear Model , 1993 .

[30]  Malte Kuß,et al.  Gaussian process models for robust regression, classification, and reinforcement learning , 2006 .

[31]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[32]  D. Rubin,et al.  ML ESTIMATION OF THE t DISTRIBUTION USING EM AND ITS EXTENSIONS, ECM AND ECME , 1999 .

[33]  Matthias W. Seeger,et al.  Fast Convergent Algorithms for Expectation Propagation Approximate Bayesian Inference , 2010, AISTATS.

[34]  T. Heskes,et al.  Expectation propagation for approximate inference in dynamic bayesian networks , 2002, UAI 2002.

[35]  M. Seeger Expectation Propagation for Exponential Families , 2005 .

[36]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[37]  Ole Winther,et al.  Expectation Consistent Approximate Inference , 2005, J. Mach. Learn. Res..

[38]  Bruno De Finetti,et al.  The Bayesian Approach to the Rejection of Outliers , 1961 .

[39]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[40]  David J. C. MacKay,et al.  Variational Gaussian process classifiers , 2000, IEEE Trans. Neural Networks Learn. Syst..

[41]  L. Shampine Vectorized adaptive quadrature in MATLAB , 2008 .

[42]  Aki Vehtari,et al.  Sparse Log Gaussian Processes via MCMC for Spatial Epidemiology , 2007, Gaussian Processes in Practice.

[43]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[44]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[45]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[46]  Botond Cseke,et al.  Properties of Bethe Free Energies and Message Passing in Gaussian Models , 2011, J. Artif. Intell. Res..

[47]  A. O'Hagan,et al.  On Outlier Rejection Phenomena in Bayes Inference , 1979 .