Variational Inference for Sparse Spectrum Approximation in Gaussian Process Regression

Standard sparse pseudo-input approximations to the Gaussian process (GP) cannot handle complex functions well. Sparse spectrum alternatives attempt to answer this but are known to over-fit. We suggest the use of variational inference for the sparse spectrum approximation to avoid both issues. We model the covariance function with a finite Fourier series approximation and treat it as a random variable. The random covariance function has a posterior, on which a variational distribution is placed. The variational distribution transforms the random covariance function to fit the data. We study the properties of our approximate inference, compare it to alternative ones, and extend it to the distributed and stochastic domains. Our approximation captures complex functions better than standard approaches and avoids over-fitting.

[1]  R. V. Churchill,et al.  Lectures on Fourier Integrals , 1959 .

[2]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[3]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[4]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[5]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[6]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[7]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[8]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[9]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[10]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[11]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[12]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[13]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[14]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[15]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[16]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[17]  Georg Lindgren,et al.  Stationary Stochastic Processes: Theory and Applications , 2012 .

[18]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[19]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[20]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[21]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[22]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[23]  Carl E. Rasmussen,et al.  Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models , 2014, NIPS.

[24]  Andrew Gordon Wilson,et al.  Fast Kernel Learning for Multidimensional Pattern Extrapolation , 2014, NIPS.