Product Kernel Interpolation for Scalable Gaussian Processes

Recent work shows that inference for Gaussian processes can be performed efficiently using iterative methods that rely only on matrix-vector multiplications (MVMs). Structured Kernel Interpolation (SKI) exploits these techniques by deriving approximate kernels with very fast MVMs. Unfortunately, such strategies suffer badly from the curse of dimensionality. We develop a new technique for MVM based learning that exploits product kernel structure. We demonstrate that this technique is broadly applicable, resulting in linear rather than exponential runtime with dimension for SKI, as well as state-of-the-art asymptotic complexity for multi-task GPs.

[1]  Iain Murray Introduction To Gaussian Processes , 2008 .

[2]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[3]  Suchi Saria,et al.  A Framework for Individualizing Predictions of Disease Trajectories by Exploiting Multi-Resolution Structure , 2015, NIPS.

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[6]  Yousef Saad,et al.  Fast Estimation of tr(f(A)) via Stochastic Lanczos Quadrature , 2017, SIAM J. Matrix Anal. Appl..

[7]  R. Keys Cubic convolution interpolation for digital image processing , 1981 .

[8]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[9]  Hongyuan Zha,et al.  Low-Rank Matrix Approximation Using the Lanczos Bidiagonalization Process with Applications , 1999, SIAM J. Sci. Comput..

[10]  Roderick Murray-Smith,et al.  Hierarchical Gaussian process mixtures for regression , 2005, Stat. Comput..

[11]  David Ginsbourger,et al.  Additive Kernels for Gaussian Process Modeling , 2011, 1103.4023.

[12]  Kai Li,et al.  Sparse Multi-Output Gaussian Processes for Medical Time Series Prediction , 2017 .

[13]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[14]  T. Ensslin,et al.  Improving stochastic estimates with inference methods: calculating matrix diagonals. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Andrew Gordon Wilson,et al.  Scalable Log Determinants for Gaussian Process Kernel Learning , 2017, NIPS.

[16]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[17]  Ahmed M. Alaa,et al.  Personalized Risk Scoring for Critical Care Prognosis Using Mixtures of Gaussian Processes , 2016, IEEE Transactions on Biomedical Engineering.

[18]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[19]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[20]  Neil D. Lawrence,et al.  Computationally Efficient Convolved Multiple Output Gaussian Processes , 2011, J. Mach. Learn. Res..

[21]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[22]  Suchi Saria,et al.  A Bayesian Nonparametic Approach for Estimating Individualized Treatment-Response Curves , 2016, ArXiv.

[23]  Neil D. Lawrence,et al.  Fast Nonparametric Clustering of Structured Time-Series , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[25]  David A. Clifton,et al.  Multitask Gaussian Processes for Multivariate Physiological Time-Series Analysis , 2015, IEEE Transactions on Biomedical Engineering.

[26]  Y. Saad,et al.  An estimator for the diagonal of a matrix , 2007 .

[27]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[28]  C. Lanczos An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .

[29]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[30]  M. Hutchinson A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[31]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[32]  John P. Cunningham,et al.  Fast Gaussian process methods for point process intensity estimation , 2008, ICML '08.

[33]  Kilian Q. Weinberger,et al.  Psychophysical Detection Testing with Bayesian Active Learning , 2015, UAI.

[34]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[35]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[36]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[37]  C. Paige Computational variants of the Lanczos method for the eigenproblem , 1972 .

[38]  Bernhard Schölkopf,et al.  Bayesian Experimental Design of Magnetic Resonance Imaging Sequences , 2008, NIPS.

[39]  Andrew Gordon Wilson,et al.  Scalable Gaussian Processes for Characterizing Multidimensional Change Surfaces , 2015, AISTATS.

[40]  Stephen J. Roberts,et al.  Improved Stochastic Trace Estimation using Mutually Unbiased Bases , 2016, UAI.