Scalable Gaussian Process for Extreme Classification

We address the limitations of Gaussian processes for multiclass classification in the setting where both the number of classes and the number of observations is very large. We propose a scalable approximate inference framework by combining the inducing points method with variational approximations of the likelihood that have been recently proposed in the literature. This leads to a tractable lower bound on the marginal likelihood that decomposes into a sum over both data points and class labels, and hence, is amenable to doubly stochastic optimization. To overcome memory issues when dealing with large datasets, we resort to amortized inference, which coupled with subsampling over classes reduces the computational and the memory footprint without a significant loss in performance. We demonstrate empirically that the proposed algorithm leads to superior performance in terms of test accuracy, and improved detection of tail labels.

[1]  David M. Blei,et al.  Augment and Reduce: Stochastic Inference for Large Categorical Distributions , 2018, ICML.

[2]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[3]  Grigorios Tsoumakas,et al.  Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[4]  Pradeep Ravikumar,et al.  PPDsparse: A Parallel Primal-Dual Sparse Method for Extreme Classification , 2017, KDD.

[5]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[6]  Maurizio Filippone,et al.  AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models , 2016, UAI.

[7]  Mark Girolami,et al.  Variational Bayesian Multinomial Probit Regression with Gaussian Process Priors , 2006, Neural Computation.

[8]  Garud Iyengar,et al.  Unbiased scalable softmax optimization , 2018, ArXiv.

[9]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[10]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[11]  Aki Vehtari,et al.  Nested expectation propagation for Gaussian process classification , 2013, J. Mach. Learn. Res..

[12]  Alan D. Saul Gaussian process based approaches for survival analysis , 2016 .

[13]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[14]  Michalis K. Titsias,et al.  One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities , 2016, NIPS.

[15]  Daniel Hernández-Lobato,et al.  Robust Multi-Class Gaussian Process Classification , 2011, NIPS.

[16]  Daniel Hernández-Lobato,et al.  Scalable Multi-Class Gaussian Process Classification using Expectation Propagation , 2017, ICML.

[17]  Bernhard Schölkopf,et al.  DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification , 2016, WSDM.