Sparse Gaussian Process Regression via L1 Penalization

To handle massive data, a variety of sparse Gaussian Process (GP) methods have been proposed to reduce the computational cost. Many of them essentially map the large dataset into a small set of basis points. A common approach to learn these basis points is evidence maximization. Nevertheless, evidence maximization may lead to overfitting and cause a high computational cost. In this paper, we propose a novel sparse GP regression approach, GPLasso, that explicitly represents the trade-off between its approximation quality and the model sparsity. GPLasso minimizes a l1-penalized KL divergence between the exact and sparse GP posterior processes. Optimizing this convex cost function leads to sparse GP parameters. Furthermore, we use incomplete Cholesky factorization to obtain low-rank matrix approximations to speed up the optimization procedure. Experimental results on synthetic and real data demonstrate that, compared with several state-of-the-art sparse GP methods and a direct low-rank matrix approximation method, GPLasso achieves a significantly improved trade-off between prediction accuracy and computational cost.

[1]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[2]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[3]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[4]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[5]  Yuan Qi,et al.  Predictive automatic relevance determination by expectation propagation , 2004, ICML.

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[8]  T. Minka Power EP , 2004 .

[9]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[10]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[11]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[12]  L. Csató Gaussian processes:iterative sparse approximations , 2002 .

[13]  Bernhard Schölkopf,et al.  Sparse multiscale gaussian process regression , 2008, ICML '08.

[14]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[15]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[16]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[17]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[18]  Wei Chu,et al.  A matching pursuit approach to sparse Gaussian process regression , 2005, NIPS.