Sparse inverse kernel Gaussian Process regression

Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. Gaussian Process regression (GPR) is a popular technique for modeling the input–output relations of a set of variables under the assumption that the weight vector has a Gaussian prior. However, it is challenging to apply GPR to large data sets since prediction based on the learned model requires inversion of an order n kernel matrix. Approximate solutions for sparse Gaussian Processes have been proposed for sparse problems. However, in almost all cases, these solution techniques are agnostic to the input domain and do not preserve the similarity structure in the data. As a result, although these solutions sometimes provide excellent accuracy, the models do not have interpretability. Such interpretable sparsity patterns are very important for many applications. We propose a new technique for sparse GPR that allows us to compute a parsimonious model while preserving the interpretability of the sparsity structure in the data. We discuss how the inverse kernel matrix used in Gaussian Process prediction gives valuable domain information and then adapt the inverse covariance estimation from Gaussian graphical models to estimate the Gaussian kernel. We solve the optimization problem using the alternating direction method of multipliers that is amenable to parallel computation. We compare the performance of this algorithm to different existing methods for sparse covariance regression in terms of both speed and accuracy. We demonstrate the performance of our method in terms of accuracy, scalability, and interpretability on two different satellite data sets from the climate domain. © 2013 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 6: 205–220, 2013

[1]  Vikas Sindhwani,et al.  Emerging topic detection using dictionary learning , 2011, CIKM '11.

[2]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[3]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[4]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[5]  Paul J. Roebber,et al.  What Do Networks Have to Do with Climate , 2006 .

[6]  Alexandre d'Aspremont,et al.  Convex optimization techniques for fitting sparse Gaussian graphical models , 2006, ICML.

[7]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[8]  Masao Fukushima,et al.  Application of the alternating direction method of multipliers to separable convex programming problems , 1992, Comput. Optim. Appl..

[9]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[10]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[11]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[12]  Volker Tresp,et al.  The generalized Bayesian committee machine , 2000, KDD '00.

[13]  D. Bertsekas,et al.  An Alternating Direction Method for Linear Programming , 1990 .

[14]  Xiaohui Xie,et al.  Efficient variable selection in support vector machines via the alternating direction method of multipliers , 2011, AISTATS.

[15]  Nitesh V. Chawla,et al.  Complex networks as a unified framework for descriptive analysis and predictive modeling in climate science , 2011, Stat. Anal. Data Min..

[16]  Kamalika Das,et al.  SPARSE INVERSE GAUSSIAN PROCESS REGRESSION WITH APPLICATION TO CLIMATE NETWORK DISCOVERY , 2011 .

[17]  Pradeep Ravikumar,et al.  Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.

[18]  R. Katz,et al.  Teleconnections linking worldwide climate anomalies : scientific basis and societal impact , 1991 .

[19]  Nitesh V. Chawla,et al.  An exploration of climate data using complex networks , 2009, SensorKDD '09.

[20]  Ashok Srivastava,et al.  Stable and Efficient Gaussian Process Calculations , 2009, J. Mach. Learn. Res..

[21]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[22]  Kamalika Das,et al.  Block-GP: Scalable Gaussian Process Regression for Multimodal Data , 2010, 2010 IEEE International Conference on Data Mining.

[23]  Feng Yan,et al.  Sparse Gaussian Process Regression via L1 Penalization , 2010, ICML.

[24]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[25]  François de Vieilleville,et al.  Alternating direction method of multipliers applied to 3D light sheet fluorescence microscopy image deblurring using GPU hardware , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[26]  R. Reynolds,et al.  The NCEP/NCAR 40-Year Reanalysis Project , 1996, Renewable Energy.

[27]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[28]  Potsdam,et al.  Complex networks in climate dynamics. Comparing linear and nonlinear network construction methods , 2009, 0907.4359.