Sparse multiscale gaussian process regression

Most existing sparse Gaussian process (g.p.) models seek computational advantages by basing their computations on a set of m basis functions that are the covariance function of the g.p. with one of its two inputs fixed. We generalise this for the case of Gaussian covariance function, by basing our computations on m Gaussian basis functions with arbitrary diagonal covariance matrices (or length scales). For a fixed number of basis functions and any given criteria, this additional flexibility permits approximations no worse and typically better than was previously possible. We perform gradient based optimisation of the marginal likelihood, which costs O(m2n) time where n is the number of data points, and compare the method to various other sparse g.p. methods. Although we focus on g.p. regression, the central idea is applicable to all kernel based algorithms, and we also provide some results for the support vector machine (s.v.m.) and kernel ridge regression (k.r.r.). Our approach outperforms the other methods, particularly for the case of very few basis functions, i. e. a very high sparsity ratio.

[1]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[2]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[3]  I. M. Pyshik,et al.  Table of integrals, series, and products , 1965 .

[4]  W. E. Williams,et al.  Green's Functions , 1970, Nature.

[5]  C. Pan A modification to the linpack downdating algorithm , 1990 .

[6]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[7]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[8]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[9]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[10]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[11]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[12]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[13]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[15]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[16]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[17]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[18]  P. Gehler,et al.  How to choose the covariance for Gaussian process regression independently of the basis , 2006 .

[19]  Bernhard Schölkopf,et al.  Implicit Surface Modelling with a Globally Regularised Basis of Compact Support , 2006, Comput. Graph. Forum.

[20]  P. Gehler,et al.  Implicit Wiener Series, Part II: Regularised estimation , 2006 .

[21]  Bernhard Schölkopf,et al.  Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions , 2006, NIPS.

[22]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.