Variational Fourier Features for Gaussian Processes

This work brings together two powerful concepts in Gaussian processes: the variational approach to sparse approximation and the spectral representation of Gaussian processes. This gives rise to an approximation that inherits the benefits of the variational approach but with the representational power and computational scalability of spectral representations. The work hinges on a key result that there exist spectral features related to a finite domain of the Gaussian process which exhibit almost-independent covariances. We derive these expressions for Matern kernels in one dimension, and generalize to more dimensions using kernels with specific structures. Under the assumption of additive Gaussian noise, our method requires only a single pass through the data set, making for very fast and accurate computation. We fit a model to 4 million training points in just a few minutes on a standard laptop. With non-conjugate likelihoods, our MCMC scheme reduces the cost of computation from O(NM2) (for a sparse Gaussian process) to O(NM) per iteration, where N is the number of data and M is the number of features.

[1]  James Hensman,et al.  Scalable transformed additive signal decomposition by non-conjugate Gaussian process inference , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[2]  Alexander G. de G. Matthews,et al.  Scalable Gaussian process inference using variational methods , 2017 .

[3]  Maurizio Filippone,et al.  Enabling scalable stochastic gradient-based inference for Gaussian processes by employing the Unbiased LInear System SolvEr (ULISSE) , 2015, ICML.

[4]  Peter J. Diggle,et al.  INLA or MCMC? A tutorial and comparative evaluation for spatial prediction in log-Gaussian Cox processes , 2012, 1202.1738.

[5]  Leslie Greengard,et al.  Accelerating the Nonuniform Fast Fourier Transform , 2004, SIAM Rev..

[6]  Neil D. Lawrence,et al.  Sparse Convolved Gaussian Processes for Multi-output Regression , 2008, NIPS.

[7]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[8]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[9]  Richard E. Turner Statistical models for natural sounds , 2010 .

[10]  Michael A. Osborne Bayesian Gaussian processes for sequential prediction, optimisation and quadrature , 2010 .

[11]  D. Ginsbourger,et al.  Additive Covariance Kernels for High-Dimensional Gaussian Process Modeling , 2011, 1111.6233.

[12]  M. Driscoll The reproducing kernel Hilbert space structure of the sample paths of a Gaussian process , 1973 .

[13]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[14]  I. M. Glazman,et al.  Theory of linear operators in Hilbert space , 1961 .

[15]  Mohammad Emtiyaz Khan,et al.  Fast Dual Variational Inference for Non-Conjugate Latent Gaussian Models , 2013, ICML.

[16]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[17]  Carl E. Rasmussen,et al.  Additive Gaussian Processes , 2011, NIPS.

[18]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[19]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[20]  Aki Vehtari,et al.  Sparse Log Gaussian Processes via MCMC for Spatial Epidemiology , 2007, Gaussian Processes in Practice.

[21]  Aníbal R. Figueiras-Vidal,et al.  Inter-domain Gaussian Processes for Sparse Inference using Inducing Features , 2009, NIPS.

[22]  Arno Solin,et al.  Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering , 2013, IEEE Signal Processing Magazine.

[23]  Ole Winther,et al.  TAP Gibbs Free Energy, Belief Propagation and Sparsity , 2001, NIPS.

[24]  J. Møller,et al.  Log Gaussian Cox Processes , 1998 .

[25]  Stephen J. Roberts,et al.  String and Membrane Gaussian Processes , 2015, J. Mach. Learn. Res..

[26]  Kian Ming Adam Chai,et al.  Variational Multinomial Logit Gaussian Process , 2012, J. Mach. Learn. Res..

[27]  Le Song,et al.  A la Carte - Learning Fast Kernels , 2014, AISTATS.

[28]  Neil D. Lawrence,et al.  Detecting periodicities with Gaussian processes , 2016, PeerJ Comput. Sci..

[29]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[30]  Neil D. Lawrence,et al.  Efficient inference in matrix-variate Gaussian models with \iid observation noise , 2011, NIPS.

[31]  Andrew Gordon Wilson,et al.  Thoughts on Massively Scalable Gaussian Processes , 2015, ArXiv.

[32]  Benjamin Recht,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[33]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[34]  Stuart J. Russell,et al.  Gaussian Process Random Fields , 2015, NIPS.

[35]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[36]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[37]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[38]  Mohammad Emtiyaz Khan,et al.  Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression , 2012, NIPS.

[39]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[40]  Michael A. Osborne,et al.  Probabilistic Integration: A Role for Statisticians in Numerical Analysis? , 2015 .

[41]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[42]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[43]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[44]  Maurizio Filippone,et al.  A comparative evaluation of stochastic-based inference methods for Gaussian process models , 2013, Machine Learning.

[45]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[46]  Wilson Fong Handbook of MRI Pulse Sequences , 2005 .

[47]  Richard E. Turner,et al.  Tree-structured Gaussian Process Approximations , 2014, NIPS.

[48]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[49]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[50]  Peter J. Diggle,et al.  Statistical Analysis of Spatial and Spatio-Temporal Point Patterns , 2013 .

[51]  Ryan P. Adams,et al.  Slice sampling covariance hyperparameters of latent Gaussian models , 2010, NIPS.

[52]  Marcel A. J. van Gerven,et al.  Regularizing Solutions to the MEG Inverse Problem Using Space-Time Separable Covariance Functions , 2016, 1604.04931.

[53]  Neil D. Lawrence,et al.  Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes , 2016, J. Mach. Learn. Res..

[54]  James Hensman,et al.  MCMC for Variationally Sparse Gaussian Processes , 2015, NIPS.

[55]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[56]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[57]  Vladimir Rokhlin,et al.  Fast Fourier Transforms for Nonequispaced Data , 1993, SIAM J. Sci. Comput..

[58]  Andrew Gordon Wilson,et al.  Fast Kernel Learning for Multidimensional Pattern Extrapolation , 2014, NIPS.

[59]  Richard E. Turner,et al.  Improving the Gaussian Process Sparse Spectrum Approximation by Representing Uncertainty in Frequency Inputs , 2015, ICML.

[60]  Michael A. Osborne,et al.  Probabilistic numerics and uncertainty in computations , 2015, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[61]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[63]  Gareth O. Roberts,et al.  Robust Markov chain Monte Carlo Methods for Spatial Generalized Linear Mixed Models , 2006 .

[64]  Marc Peter Deisenroth,et al.  Distributed Gaussian Processes , 2015, ICML.

[65]  Christopher J Paciorek,et al.  Bayesian Smoothing with Gaussian Processes Using Fourier Basis Functions in the spectralGP Package. , 2007, Journal of statistical software.

[66]  Ryan P. Adams,et al.  Elliptical slice sampling , 2009, AISTATS.

[67]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[68]  Edwin V. Bonilla,et al.  Scalable Inference for Gaussian Process Models with Black-Box Likelihoods , 2015, NIPS.

[69]  Oliver Stegle,et al.  It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals , 2013, NIPS.

[70]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[71]  W. Nowak,et al.  Application of FFT-based Algorithms for Large-Scale Universal Kriging Problems , 2009 .

[72]  Michael A. Osborne,et al.  Blitzkriging: Kronecker-structured Stochastic Gaussian Processes , 2015, 1510.07965.

[73]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[74]  James Hensman,et al.  On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes , 2015, AISTATS.

[75]  Stephen J. Roberts,et al.  Variational Inference for Gaussian Process Modulated Poisson Processes , 2014, ICML.