Scalable Gaussian Processes, with Guarantees: Kernel Approximations and Deep Feature Extraction

We provide a linear time inferential framework for Gaussian processes that supports automatic feature extraction through deep neural networks and low-rank kernel approximations. Importantly, we derive approximation guarantees bounding the Kullback-Leibler divergence between the idealized Gaussian process and one resulting from a low-rank approximation to its kernel under two types of approximations, which result in two instantiations of our framework: Deep Fourier Gaussian Processes, resulting from random Fourier feature low-rank approximations, and Deep Mercer Gaussian Processes, resulting from truncating the Mercer expansion of the kernel. We do extensive experimental evaluation of these two instantiations in a broad collection of real-world datasets providing strong evidence that they outperform a broad range of state-of-the-art methods in terms of time efficiency, negative log-predictive density, and root mean squared error.

[1]  Gregory E. Fasshauer,et al.  Green’s Functions: Taking Another Look at Kernel Approximation, RadialBasis Functions, and Splines , 2012 .

[2]  Michael J. McCourt,et al.  Stable Evaluation of Gaussian Radial Basis Function Interpolants , 2012, SIAM J. Sci. Comput..

[3]  James T. Kwok,et al.  Clustered Nyström Method for Large Scale Manifold Learning and Dimension Reduction , 2010, IEEE Transactions on Neural Networks.

[4]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[5]  Ameet Talwalkar,et al.  Ensemble Nystrom Method , 2009, NIPS.

[6]  Zhihua Zhang,et al.  Improving the modified nyström method using spectral shifting , 2014, KDD.

[7]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[8]  Maurizio Filippone,et al.  Calibrating Deep Convolutional Gaussian Processes , 2018, AISTATS.

[9]  Inderjit S. Dhillon,et al.  Fast Prediction for Large-Scale Kernel Machines , 2014, NIPS.

[10]  Haitao Liu,et al.  Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression , 2018, ICML.

[11]  Haitao Liu,et al.  When Gaussian Process Meets Big Data: A Review of Scalable GPs , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Harish Karnick,et al.  Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[13]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[14]  Inderjit S. Dhillon,et al.  Computationally Efficient Nyström Approximation using Fast Transforms , 2016, ICML.

[15]  Carl E. Rasmussen,et al.  Manifold Gaussian Processes for regression , 2014, 2016 International Joint Conference on Neural Networks (IJCNN).

[16]  G. Camps-Valls,et al.  A Survey on Gaussian Processes for Earth-Observation Data Analysis: A Comprehensive Investigation , 2016, IEEE Geoscience and Remote Sensing Magazine.

[17]  Dennis DeCoste,et al.  Compact Random Feature Maps , 2013, ICML.

[18]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[19]  Andrew Gordon Wilson,et al.  Gaussian Process Regression Networks , 2011, ICML.

[20]  Robert B. Gramacy,et al.  Ja n 20 08 Bayesian Treed Gaussian Process Models with an Application to Computer Modeling , 2009 .

[21]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[22]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[23]  François Bachoc,et al.  Nested Kriging predictions for datasets with a large number of observations , 2016, Statistics and Computing.

[24]  Deli Zhao,et al.  Scalable Gaussian Process Regression Using Deep Neural Networks , 2015, IJCAI.

[25]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[26]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[27]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[28]  Reza Ebrahimpour,et al.  Mixture of experts: a literature survey , 2014, Artificial Intelligence Review.

[29]  Tomoharu Iwata,et al.  Improving Output Uncertainty Estimation and Generalization in Deep Learning via Neural Network Gaussian Processes , 2017, 1707.05922.

[30]  K Fan,et al.  Minimax Theorems. , 1953, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[32]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[35]  Iain Murray,et al.  A framework for evaluating approximation methods for Gaussian process regression , 2012, J. Mach. Learn. Res..

[36]  Marc Peter Deisenroth,et al.  Distributed Gaussian Processes , 2015, ICML.

[37]  Evgeny Burnaev,et al.  Forecasting of Commercial Sales with Large Scale Gaussian Processes , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[38]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[39]  Carl E. Rasmussen,et al.  Rates of Convergence for Sparse Variational Gaussian Process Regression , 2019, ICML.

[40]  Richard E. Turner,et al.  Improving the Gaussian Process Sparse Spectrum Approximation by Representing Uncertainty in Frequency Inputs , 2015, ICML.

[41]  Ameya Velingker,et al.  Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees , 2018, ICML.

[42]  Zoubin Ghahramani,et al.  Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks , 2017, 1707.02476.

[43]  Dirk Roos,et al.  Deep Gaussian Covariance Network , 2017, ArXiv.

[44]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[45]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[46]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[47]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[48]  Mikio L. Braun,et al.  Accurate Error Bounds for the Eigenvalues of the Kernel Matrix , 2006, J. Mach. Learn. Res..

[49]  Maurizio Filippone,et al.  Random Feature Expansions for Deep Gaussian Processes , 2016, ICML.

[50]  Manfred Opper,et al.  Finite-Dimensional Approximation of Gaussian Processes , 1998, NIPS.

[51]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[52]  Robert B. Gramacy,et al.  laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R , 2016 .

[53]  Arno Solin,et al.  Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..

[54]  Vikas Sindhwani,et al.  Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels , 2014, J. Mach. Learn. Res..

[55]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[56]  James T. Kwok,et al.  Making Large-Scale Nyström Approximation Possible , 2010, ICML.

[57]  Geoffrey E. Hinton,et al.  Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[58]  Andrew Gordon Wilson,et al.  Learning Scalable Deep Kernels with Recurrent Structure , 2016, J. Mach. Learn. Res..

[59]  Prateek Jain,et al.  Online and Stochastic Gradient Methods for Non-decomposable Loss Functions , 2014, NIPS.

[60]  Stephen Tyree,et al.  Exact Gaussian Processes on a Million Data Points , 2019, NeurIPS.

[61]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.