Multi-Class Gaussian Process Classification Made Conjugate: Efficient Inference via Data Augmentation

We propose a new scalable multi-class Gaussian process classification approach building on a novel modified softmax likelihood function. The new likelihood has two benefits: it leads to well-calibrated uncertainty estimates and allows for an efficient latent variable augmentation. The augmented model has the advantage that it is conditionally conjugate leading to a fast variational inference method via block coordinate ascent updates. Previous approaches suffered from a trade-off between uncertainty calibration and speed. Our experiments show that our method leads to well-calibrated uncertainty estimates and competitive predictive performance while being up to two orders faster than the state of the art.

[1]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[2]  Manfred Opper,et al.  Efficient Bayesian Inference for a Gaussian Process Density Model , 2018, UAI.

[3]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[4]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[5]  David M. Blei,et al.  Augment and Reduce: Stochastic Inference for Large Categorical Distributions , 2018, ICML.

[6]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[7]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[8]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[9]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[10]  Mark Girolami,et al.  Variational Bayesian Multinomial Probit Regression with Gaussian Process Priors , 2006, Neural Computation.

[11]  David M. Blei,et al.  A Variational Analysis of Stochastic Gradient Algorithms , 2016, ICML.

[12]  Michalis Titsias Rc Aueb One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities , 2016, NIPS 2016.

[13]  Stephan Mandt,et al.  Quasi-Monte Carlo Variational Inference , 2018, ICML.

[14]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[15]  Geoffrey Zweig,et al.  Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.

[16]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[17]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[18]  James Hensman,et al.  Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models , 2018, AISTATS.

[19]  Dmitry Kropotov,et al.  Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition , 2017, AISTATS.

[20]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[21]  Kian Ming Adam Chai,et al.  Variational Multinomial Logit Gaussian Process , 2012, J. Mach. Learn. Res..

[22]  Rok Češnovar,et al.  Bayesian Lasso and multinomial logistic regression on GPU , 2017, PloS one.

[23]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[24]  Marius Kloft,et al.  Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation , 2018, AAAI.

[25]  Aki Vehtari,et al.  Nested expectation propagation for Gaussian process classification , 2013, J. Mach. Learn. Res..

[26]  J. S. Maritz,et al.  Empirical Bayes Methods with Applications , 1989 .

[27]  Stephen G. Walker,et al.  Posterior Sampling When the Normalizing Constant is Unknown , 2011, Commun. Stat. Simul. Comput..

[28]  Hyun-Chul Kim,et al.  Bayesian Gaussian Process Classification with the EM-EP Algorithm , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Daniel Hernández-Lobato,et al.  Scalable Multi-Class Gaussian Process Classification using Expectation Propagation , 2017, ICML.

[30]  Lu Liu,et al.  Classification with ClassOverlapping: A Systematic Study , 2010, ICE-B 2010.

[31]  Daniel Hernández-Lobato,et al.  Robust Multi-Class Gaussian Process Classification , 2011, NIPS.

[32]  James Hensman,et al.  MCMC for Variationally Sparse Gaussian Processes , 2015, NIPS.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[35]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[37]  Scott W. Linderman,et al.  Dependent Multinomial Models Made Easy: Stick-Breaking with the Polya-gamma Augmentation , 2015, NIPS.

[38]  Manfred Opper,et al.  Inverse Ising problem in continuous time: A latent variable approach. , 2017, Physical review. E.

[39]  Art B. Owen,et al.  Monte Carlo extension of quasi-Monte Carlo , 1998, 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274).