Constant-Time Predictive Distributions for Gaussian Processes

One of the most compelling features of Gaussian process (GP) regression is its ability to provide well-calibrated posterior distributions. Recent advances in inducing point methods have sped up GP marginal likelihood and posterior mean computations, leaving posterior covariance estimation and sampling as the remaining computational bottlenecks. In this paper we address these shortcomings by using the Lanczos algorithm to rapidly approximate the predictive covariance matrix. Our approach, which we refer to as LOVE (LanczOs Variance Estimates), substantially improves time and space complexity. In our experiments, LOVE computes covariances up to 2,000 times faster and draws samples 18,000 times faster than existing methods, all without sacrificing accuracy.

[1]  John P. Cunningham,et al.  Fast Gaussian process methods for point process intensity estimation , 2008, ICML '08.

[2]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[3]  C. Lanczos An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .

[4]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Bernhard Schölkopf,et al.  Bayesian Experimental Design of Magnetic Resonance Imaging Sequences , 2008, NIPS.

[7]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[8]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[9]  Andrew Gordon Wilson,et al.  Thoughts on Massively Scalable Gaussian Processes , 2015, ArXiv.

[10]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[11]  Roman Garnett,et al.  Discovering and Exploiting Additive Structure for Bayesian Optimization , 2017, AISTATS.

[12]  Warren B. Powell,et al.  The Knowledge-Gradient Policy for Correlated Normal Beliefs , 2009, INFORMS J. Comput..

[13]  H. Simon The Lanczos algorithm with partial reorthogonalization , 1984 .

[14]  Michael K. Schneider,et al.  Krylov Subspace Estimation , 2000, SIAM J. Sci. Comput..

[15]  B. Parlett A new look at the Lanczos algorithm for solving symmetric systems of linear equations , 1980 .

[16]  Kirthevasan Kandasamy,et al.  High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[17]  Finale Doshi-Velez,et al.  A Roadmap for a Rigorous Science of Interpretability , 2017, ArXiv.

[18]  Zi Wang,et al.  Batched High-dimensional Bayesian Optimization via Structural Kernel Learning , 2017, ICML.

[19]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[20]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[21]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[22]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[23]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[24]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[25]  R. Keys Cubic convolution interpolation for digital image processing , 1981 .

[26]  George Papandreou,et al.  Efficient variational inference in large-scale Bayesian compressed sensing , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[27]  Andrew Gordon Wilson,et al.  Scalable Log Determinants for Gaussian Process Kernel Learning , 2017, NIPS.

[28]  Fang Chen,et al.  Effects of Uncertainty and Cognitive Load on User Trust in Predictive Decision Making , 2017, INTERACT.

[29]  Christopher C. Paige,et al.  Practical use of the symmetric Lanczos process with re-orthogonalization , 1970 .

[30]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[32]  Suchi Saria,et al.  What-If Reasoning with Counterfactual Gaussian Processes , 2017, NIPS 2017.

[33]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[34]  Andrew Gordon Wilson,et al.  Product Kernel Interpolation for Scalable Gaussian Processes , 2018, AISTATS.

[35]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[36]  Zi Wang,et al.  Max-value Entropy Search for Efficient Bayesian Optimization , 2017, ICML.

[37]  Charles Van Loan,et al.  Introduction to Scientific Computing: A Matrix-Vector Approach Using MATLAB , 1996 .

[38]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[39]  Y. Saad,et al.  On the Lánczos method for solving symmetric linear systems with several right-hand sides , 1987 .