Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition

We propose a method (TT-GP) for approximate inference in Gaussian Process (GP) models. We build on previous scalable GP research including stochastic variational inference based on inducing inputs, kernel interpolation, and structure exploiting algebra. The key idea of our method is to use Tensor Train decomposition for variational parameters, which allows us to train GPs with billions of inducing inputs and achieve state-of-the-art results on several benchmarks. Further, our approach allows for training kernels based on deep neural networks without any modifications to the underlying GP model. A neural network learns a multidimensional embedding for the data, which is used by the GP to make the final prediction. We train GP and neural network parameters end-to-end without pretraining, through maximization of GP marginal likelihood. We show the efficiency of the proposed approach on several regression and classification benchmark datasets including MNIST, CIFAR-10, and Airline.

[1]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[2]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[3]  Alexander Novikov,et al.  Tensor Train decomposition on TensorFlow (T3F) , 2018, J. Mach. Learn. Res..

[4]  Maurizio Filippone,et al.  Random Feature Expansions for Deep Gaussian Processes , 2016, ICML.

[5]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[6]  Guillaume Bouchard Efficient Bounds for the Softmax Function and Applications to Approximate Inference in Hybrid models , 2008 .

[7]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[8]  Kurt Cutajar Practical learning of deep gaussian processes via random Fourier features , 2016 .

[9]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[10]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[11]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[12]  Anton Rodomanov,et al.  Putting MRFs on a Tensor Train , 2014, ICML.

[13]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[14]  R. Keys Cubic convolution interpolation for digital image processing , 1981 .

[15]  Michael A. Osborne,et al.  Blitzkriging: Kronecker-structured Stochastic Gaussian Processes , 2015, 1510.07965.

[16]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[17]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[18]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[19]  Zoubin Ghahramani,et al.  GPstruct: Bayesian Structured Prediction Using Gaussian Processes , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[23]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.