Scalable Gaussian Process Classification with Additive Noise for Various Likelihoods

Gaussian process classification (GPC) provides a flexible and powerful statistical framework describing joint distributions over function space. Conventional GPCs however suffer from (i) poor scalability for big data due to the full kernel matrix, and (ii) intractable inference due to the non-Gaussian likelihoods. Hence, various scalable GPCs have been proposed through (i) the sparse approximation built upon a small inducing set to reduce the time complexity; and (ii) the approximate inference to derive analytical evidence lower bound (ELBO). However, these scalable GPCs equipped with analytical ELBO are limited to specific likelihoods or additional assumptions. In this work, we present a unifying framework which accommodates scalable GPCs using various likelihoods. Analogous to GP regression (GPR), we introduce additive noises to augment the probability space for (i) the GPCs with step, (multinomial) probit and logit likelihoods via the internal variables; and particularly, (ii) the GPC using softmax likelihood via the noise variables themselves. This leads to unified scalable GPCs with analytical ELBO by using variational inference. Empirically, our GPCs showcase better results than state-of-the-art scalable GPCs for extensive binary/multi-class classification tasks with up to two million data points.

[1]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[2]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[3]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[4]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[5]  Joachim Denzler,et al.  Large-scale gaussian process multi-class classification for semantic segmentation and facade recognition , 2013, Machine Vision and Applications.

[6]  Hyun-Chul Kim,et al.  Bayesian Gaussian Process Classification with the EM-EP Algorithm , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Hyun-Chul Kim,et al.  Outlier Robust Gaussian Process Classification , 2008, SSPR/SPR.

[8]  Marius Kloft,et al.  Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation , 2018, AAAI.

[9]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[10]  Andrew Gordon Wilson,et al.  Product Kernel Interpolation for Scalable Gaussian Processes , 2018, AISTATS.

[11]  Daniel Hernández-Lobato,et al.  Scalable Gaussian Process Classification via Expectation Propagation , 2015, AISTATS.

[12]  Richard E. Turner,et al.  Stochastic Expectation Propagation , 2015, NIPS.

[13]  James Hensman,et al.  Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models , 2018, AISTATS.

[14]  Daniel Hernández-Lobato,et al.  Scalable Multi-Class Gaussian Process Classification using Expectation Propagation , 2017, ICML.

[15]  Byron Boots,et al.  Orthogonally Decoupled Variational Gaussian Processes , 2018, NeurIPS.

[16]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[17]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[18]  David M. Blei,et al.  Augment and Reduce: Stochastic Inference for Large Categorical Distributions , 2018, ICML.

[19]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[20]  Yuan Qi,et al.  Asynchronous Distributed Variational Gaussian Process for Regression , 2017, ICML.

[21]  Byron Boots,et al.  Variational Inference for Gaussian Process Models with Linear Complexity , 2017, NIPS.

[22]  Alexander G. de G. Matthews,et al.  Scalable Gaussian process inference using variational methods , 2017 .

[23]  Andrew Gordon Wilson,et al.  Constant-Time Predictive Distributions for Gaussian Processes , 2018, ICML.

[24]  S. Bowling,et al.  A Logistic Approximation to The Cumulative Normal Distribution , 2009 .

[25]  Haitao Liu,et al.  When Gaussian Process Meets Big Data: A Review of Scalable GPs , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Kian Hsiang Low,et al.  A Unifying Framework of Anytime Sparse Gaussian Process Regression Models with Stochastic Variational Inference for Big Data , 2015, ICML.

[27]  Haitao Liu,et al.  Remarks on multi-output Gaussian process regression , 2018, Knowl. Based Syst..

[28]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[29]  Carl E. Rasmussen,et al.  Understanding Probabilistic Sparse Gaussian Process Approximations , 2016, NIPS.

[30]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[31]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[32]  Lorenzo Rosasco,et al.  Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification , 2018, NeurIPS.

[33]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[34]  James Hensman,et al.  MCMC for Variationally Sparse Gaussian Processes , 2015, NIPS.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[37]  Chuan Li,et al.  Spectrum-Based Kernel Length Estimation for Gaussian Process Classification , 2014, IEEE Transactions on Cybernetics.

[38]  Sean B. Holden,et al.  The Generalized FITC Approximation , 2007, NIPS.

[39]  Florian Wenzel,et al.  Multi-Class Gaussian Process Classification Made Conjugate: Efficient Inference via Data Augmentation , 2019, UAI.