Fully Scalable Gaussian Processes using Subspace Inducing Inputs

We introduce fully scalable Gaussian processes, an implementation scheme that tackles the problem of treating a high number of training instances together with high dimensional input data. Our key idea is a representation trick over the inducing variables called subspace inducing inputs. This is combined with certain matrix-preconditioning based parametrizations of the variational distributions that lead to simplified and numerically stable variational lower bounds. Our illustrative applications are based on challenging extreme multi-label classification problems with the extra burden of the very large number of class labels. We demonstrate the usefulness of our approach by presenting predictive performances together with low computational times in datasets with extremely large number of instances and input dimensions.

[1]  Neil D. Lawrence,et al.  Variational Gaussian Process Dynamical Systems , 2011, NIPS.

[2]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[3]  M. Opper Sparse Online Gaussian Processes , 2008 .

[4]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[5]  Manik Varma,et al.  FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[8]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[9]  Yee Whye Teh,et al.  Semiparametric latent factor models , 2005, AISTATS.

[10]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[11]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[12]  Grigorios Tsoumakas,et al.  Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[13]  Roni Khardon,et al.  Sparse Variational Inference for Generalized GP Models , 2015, ICML.

[14]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[15]  Neil D. Lawrence,et al.  Computationally Efficient Convolved Multiple Output Gaussian Processes , 2011, J. Mach. Learn. Res..

[16]  Grigorios Tsoumakas,et al.  Multilabel Text Classification for Automated Tag Suggestion , 2008 .

[17]  Johannes Fürnkranz,et al.  Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain , 2008, ECML/PKDD.

[18]  Edwin V. Bonilla,et al.  Scalable Inference for Gaussian Process Models with Black-Box Likelihoods , 2015, NIPS.

[19]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[20]  Manik Varma,et al.  Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[21]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[22]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[23]  D. Stoyan,et al.  Hans Wackernagel: Multivariate Geostatistics. An Introduction with Applications. With 75 Figures and 5 Tables. Springer-Verlag, Berlin, Heidelberg, New York, 235 pp., 1995, DM 68.-ISBN 3-540-60127-9 , 1996 .

[24]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[25]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[26]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[27]  Richard E. Turner,et al.  A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation , 2016, J. Mach. Learn. Res..

[28]  James Hensman,et al.  On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes , 2015, AISTATS.

[29]  Sebastián Ventura,et al.  Multi‐label learning: a review of the state of the art and ongoing research , 2014, WIREs Data Mining Knowl. Discov..

[30]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[31]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[32]  Daniel Hernández-Lobato,et al.  Scalable Gaussian Process Classification via Expectation Propagation , 2015, AISTATS.

[33]  Lothar Reichel,et al.  Augmented Implicitly Restarted Lanczos Bidiagonalization Methods , 2005, SIAM J. Sci. Comput..

[34]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[35]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[36]  Stephen J. Roberts,et al.  Variational Inference for Gaussian Process Modulated Poisson Processes , 2014, ICML.

[37]  Pradeep Ravikumar,et al.  PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification , 2016, ICML.

[38]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[39]  Sebastián Ventura,et al.  A Tutorial on Multilabel Learning , 2015, ACM Comput. Surv..

[40]  Piyush Rai,et al.  Scalable Generative Models for Multi-label Learning with Missing Labels , 2017, ICML.

[41]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.