Scalable Gaussian Processes with Grid-Structured Eigenfunctions (GP-GRIEF)

We introduce a kernel approximation strategy that enables computation of the Gaussian process log marginal likelihood and all hyperparameter derivatives in $\mathcal{O}(p)$ time. Our GRIEF kernel consists of $p$ eigenfunctions found using a Nystrom approximation from a dense Cartesian product grid of inducing points. By exploiting algebraic properties of Kronecker and Khatri-Rao tensor products, computational complexity of the training procedure can be practically independent of the number of inducing points. This allows us to use arbitrarily many inducing points to achieve a globally accurate kernel approximation, even in high-dimensional problems. The fast likelihood evaluation enables type-I or II Bayesian inference on large-scale datasets. We benchmark our algorithms on real-world problems with up to two-million training points and $10^{33}$ inducing points.

[1]  C. Loan The ubiquitous Kronecker product , 2000 .

[2]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[3]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[4]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[5]  Zhihua Zhang,et al.  Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling , 2013, J. Mach. Learn. Res..

[6]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[7]  Le Song,et al.  A la Carte - Learning Fast Kernels , 2014, AISTATS.

[8]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[9]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[10]  Suvrit Sra,et al.  Fast DPP Sampling for Nystrom with Application to Kernel Methods , 2016, ICML.

[11]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[12]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[13]  Shuangzhe Liu,et al.  Hadamard, Khatri-Rao, Kronecker and Other Matrix Products , 2008 .

[14]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[15]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[16]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[17]  Michael A. Osborne,et al.  Blitzkriging: Kronecker-structured Stochastic Gaussian Processes , 2015, 1510.07965.

[18]  Cameron Musco,et al.  Recursive Sampling for the Nystrom Method , 2016, NIPS.

[19]  Mohamed-Ali Belabbas,et al.  Spectral methods in machine learning and new strategies for very large datasets , 2009, Proceedings of the National Academy of Sciences.

[20]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[21]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[22]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[23]  Yuan Qi,et al.  EigenGP: Gaussian Process Models with Adaptive Eigenfunctions , 2014, IJCAI.

[24]  Douglas M. Bates,et al.  Unconstrained parametrizations for variance-covariance matrices , 1996, Stat. Comput..

[25]  R B Paris Hadamard 코드를 이용한 음성인식 무선덤웨이터의 구현 , 2011 .

[26]  R. Taylor,et al.  The Numerical Treatment of Integral Equations , 1978 .

[27]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[28]  Michael A. Osborne,et al.  Preconditioning Kernel Matrices , 2016, ICML.

[29]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[30]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[31]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[32]  Carl Edward Rasmussen,et al.  Observations on the Nyström Method for Gaussian Process Prediction , 2002 .

[33]  Martin D. Buhmann,et al.  Radial Basis Functions: Theory and Implementations: Preface , 2003 .

[34]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.