Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. Gaussian Process regression is a popular technique for modeling the input-output relations of a set of variables under the assumption that the weight vector has a Gaussian prior. However, it is challenging to apply Gaussian Process regression to large data sets since prediction based on the learned model requires inversion of an order n kernel matrix. Approximate solutions for sparse Gaussian Processes have been proposed for sparse problems. However, in almost all cases, these solution techniques are agnostic to the input domain and do not preserve the similarity structure in the data. As a result, although these solutions sometimes provide excellent accuracy, the models do not have interpretability. Such interpretable sparsity patterns are very important for many applications. We propose a new technique for sparse Gaussian Process regression that allows us to compute a parsimonious model while preserving the interpretability of the sparsity structure in the data. We discuss how the inverse kernel matrix used in Gaussian Process prediction gives valuable domain information and then adapt the inverse covariance estimation from Gaussian graphical models to estimate the Gaussian kernel. We solve the optimization problem using the alternating direction method of multipliers that is amenable to parallel computation. We demonstrate the performance of our method in terms of accuracy, scalability and interpretability on a climate data set. In many application domains, it is important to predict the value of one feature based on certain other measured features. For example, in the Earth Sciences, predicting the precipitation at one location given the humidity, sea surface temperature, cloud cover, and other related factors is an important problem in climate modeling. For such problems, simple linear regression based on mini- mization of the mean squared error between the true and predicted values can be used for modeling the relationship between the input and the target features. In decision support systems which use these predictive algorithms, a prediction with low confidence may be treated differently than if the same prediction was given with high-confidence. Thus, while the predicted value from the regression function is clearly important, the confidence in the prediction is equally important. A simple model such as linear regression does not provide us with that information. Also, models like linear regres- sion, in spite of being easy to fit and being highly scalable, fail to capture nonlinear relationships in the data. Gaussian Process regression (GPR) is one regression model that can capture nonlinear relationships and outputs a distribution of the prediction where the variance of the predicted distri- bution acts as a measure of confidence in the prediction. Moreover, the inverse kernel (or covariance) matrix has many interesting properties along the gaussian graphical model perspective, that can be exploited for better understanding relationships within the training examples. Depending on the nature of the data, these relationships can indicate dependencies (causalities) for certain models. However, predictions based on GPR method, requires inversion of a kernel (or covariance) ma- trix of size nn, where n is the number of training instances. This kernel inversion becomes a bottleneck for very large datasets. Most of the existing methods for efficient computation in GPR involve numerical approximation techniques that exploit data sparsity. While this does speed up
[1]
Kamalika Das,et al.
Block-GP: Scalable Gaussian Process Regression for Multimodal Data
,
2010,
2010 IEEE International Conference on Data Mining.
[2]
Feng Yan,et al.
Sparse Gaussian Process Regression via L1 Penalization
,
2010,
ICML.
[3]
R. Tibshirani.
Regression Shrinkage and Selection via the Lasso
,
1996
.
[4]
Carl E. Rasmussen,et al.
Gaussian processes for machine learning
,
2005,
Adaptive computation and machine learning.
[5]
R. Tibshirani,et al.
Least angle regression
,
2004,
math/0406456.
[6]
D. Bertsekas,et al.
An Alternating Direction Method for Linear Programming
,
1990
.
[7]
Volker Tresp,et al.
The generalized Bayesian committee machine
,
2000,
KDD '00.
[8]
Carl E. Rasmussen,et al.
A Unifying View of Sparse Approximate Gaussian Process Regression
,
2005,
J. Mach. Learn. Res..
[9]
R. Reynolds,et al.
The NCEP/NCAR 40-Year Reanalysis Project
,
1996,
Renewable Energy.
[10]
Zoubin Ghahramani,et al.
Sparse Gaussian Processes using Pseudo-inputs
,
2005,
NIPS.
[11]
N. Meinshausen,et al.
High-dimensional graphs and variable selection with the Lasso
,
2006,
math/0608017.
[12]
Ashok Srivastava,et al.
Stable and Efficient Gaussian Process Calculations
,
2009,
J. Mach. Learn. Res..
[13]
Alexander J. Smola,et al.
Sparse Greedy Gaussian Process Regression
,
2000,
NIPS.
[14]
R. Katz,et al.
Teleconnections linking worldwide climate anomalies : scientific basis and societal impact
,
1991
.
[15]
Nitesh V. Chawla,et al.
An exploration of climate data using complex networks
,
2009,
SensorKDD '09.
[16]
R. Tibshirani,et al.
Sparse inverse covariance estimation with the graphical lasso.
,
2008,
Biostatistics.
[17]
Stephen P. Boyd,et al.
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
,
2011,
Found. Trends Mach. Learn..
[18]
Masao Fukushima,et al.
Application of the alternating direction method of multipliers to separable convex programming problems
,
1992,
Comput. Optim. Appl..
[19]
Alexandre d'Aspremont,et al.
Convex optimization techniques for fitting sparse Gaussian graphical models
,
2006,
ICML.
[20]
G. Wahba.
Spline models for observational data
,
1990
.