GoGP: scalable geometric-based Gaussian process for online regression

One of the most challenging problems in Gaussian process regression is to cope with large-scale datasets and to tackle an online learning setting where data instances arrive irregularly and continuously. In this paper, we introduce a novel online Gaussian process model that scales efficiently with large-scale datasets. Our proposed GoGP is constructed based on the geometric and optimization views of the Gaussian process regression, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always offers a sparse solution, which can approximate the true optima up to any level of precision specified a priori. Moreover, to further speed up the GoGP accompanied with a positive semi-definite and shift-invariant kernel such as the well-known Gaussian kernel and also address the curse of kernelization problem, wherein the model size linearly rises with data size accumulated over time in the context of online learning, we proposed to approximate the original kernel using the Fourier random feature kernel. The model of GoGP with Fourier random feature (i.e., GoGP-RF) can be stored directly in a finite-dimensional random feature space, hence being able to avoid the curse of kernelization problem and scalable efficiently and effectively with large-scale datasets. We extensively evaluated our proposed methods against the state-of-the-art baselines on several large-scale datasets for online regression task. The experimental results show that our GoGP(s) delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared with its rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.

[1]  Slobodan Vucetic,et al.  Twin Vector Machines for Online Learning on a Budget , 2009, SDM.

[2]  Kian Hsiang Low,et al.  A Unifying Framework of Anytime Sparse Gaussian Process Regression Models with Stochastic Variational Inference for Big Data , 2015, ICML.

[3]  Koby Crammer,et al.  Online Classification on a Budget , 2003, NIPS.

[4]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[5]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[6]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[7]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[8]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[9]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[10]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[11]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[12]  Trung Le,et al.  Approximation Vector Machines for Large-scale Online Learning , 2016, J. Mach. Learn. Res..

[13]  Rong Jin,et al.  Efficient Kernel Clustering Using Random Fourier Features , 2012, 2012 IEEE 12th International Conference on Data Mining.

[14]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[15]  Slobodan Vucetic,et al.  Online Passive-Aggressive Algorithms on a Budget , 2010, AISTATS.

[16]  Kian Hsiang Low,et al.  A Distributed Variational Inference Framework for Unifying Parallel Sparse Gaussian Process Regression Models , 2016, ICML.

[17]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[18]  Zoubin Ghahramani,et al.  Local and global sparse Gaussian process approximations , 2007, AISTATS.

[19]  Trung Le,et al.  Budgeted Semi-supervised Support Vector Machine , 2016, UAI.

[20]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[21]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[22]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[23]  Barbara Caputo,et al.  Bounded Kernel-Based Online Learning , 2009, J. Mach. Learn. Res..

[24]  Trung Le,et al.  Multiple Kernel Learning with Data Augmentation , 2016, ACML.

[25]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[26]  Trung Le,et al.  Large-scale Online Kernel Learning with Random Feature Reparameterization , 2017, IJCAI.

[27]  Trung Le,et al.  Distributed data augmented support vector machine on Spark , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[28]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[29]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Fixed Budget , 2005, NIPS.

[30]  Trung Le,et al.  Nonparametric Budgeted Stochastic Gradient Descent , 2016, AISTATS.

[31]  Steven C. H. Hoi,et al.  Large Scale Online Kernel Learning , 2016, J. Mach. Learn. Res..

[32]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.