Sparse Orthogonal Variational Inference for Gaussian Processes

We introduce a new interpretation of sparse variational approximations for Gaussian processes using inducing points, which can lead to more scalable algorithms than previous methods. It is based on decomposing a Gaussian process as a sum of two independent processes: one spanned by a finite basis of inducing points and the other capturing the remaining variation. We show that this formulation recovers existing approximations and at the same time allows to obtain tighter lower bounds on the marginal likelihood and new stochastic variational inference algorithms. We demonstrate the efficiency of these algorithms in several Gaussian process models ranging from standard regression to multi-class classification using (deep) convolutional Gaussian processes and report state-of-the-art results on CIFAR-10 among purely GP-based models.

[1]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[2]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[3]  Byron Boots,et al.  Variational Inference for Gaussian Process Models with Linear Complexity , 2017, NIPS.

[4]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5]  Stephen Tyree,et al.  Exact Gaussian Processes on a Million Data Points , 2019, NeurIPS.

[6]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[7]  M. Opper Sparse Online Gaussian Processes , 2008 .

[8]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[9]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[10]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[11]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[12]  Mohammad Emtiyaz Khan,et al.  Scalable Training of Inference Networks for Gaussian-Process Models , 2019, ICML.

[13]  Aníbal R. Figueiras-Vidal,et al.  Inter-domain Gaussian Processes for Sparse Inference using Inducing Features , 2009, NIPS.

[14]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[15]  James Hensman,et al.  Translation Insensitivity for Deep Convolutional Gaussian Processes , 2019, ArXiv.

[16]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[17]  Jaehoon Lee,et al.  Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes , 2018, ICLR.

[18]  Neil D. Lawrence,et al.  Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes , 2016, J. Mach. Learn. Res..

[19]  Daniel Hernández-Lobato,et al.  Robust Multi-Class Gaussian Process Classification , 2011, NIPS.

[20]  James Hensman,et al.  MCMC for Variationally Sparse Gaussian Processes , 2015, NIPS.

[21]  Byron Boots,et al.  Incremental Variational Sparse Gaussian Process Regression , 2016, NIPS.

[22]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[23]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[24]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[25]  Ruosong Wang,et al.  On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.

[26]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[27]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[28]  Jos'e Miguel Hern'andez-Lobato,et al.  Deep Gaussian Processes with Decoupled Inducing Inputs , 2018 .

[29]  Andrew Gordon Wilson,et al.  Product Kernel Interpolation for Scalable Gaussian Processes , 2018, AISTATS.

[30]  Ryan P. Adams,et al.  Slice sampling covariance hyperparameters of latent Gaussian models , 2010, NIPS.

[31]  Samuel Kaski,et al.  Deep convolutional Gaussian processes , 2018, ECML/PKDD.

[32]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[33]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[34]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[36]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[37]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[38]  Richard E. Turner,et al.  A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation , 2016, J. Mach. Learn. Res..

[39]  Prasanth B. Nair,et al.  Scalable Gaussian Processes with Grid-Structured Eigenfunctions (GP-GRIEF) , 2018, ICML.

[40]  Marc Peter Deisenroth,et al.  Doubly Stochastic Variational Inference for Deep Gaussian Processes , 2017, NIPS.

[41]  Carl E. Rasmussen,et al.  Convolutional Gaussian Processes , 2017, NIPS.

[42]  Arno Solin,et al.  Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..

[43]  Byron Boots,et al.  Orthogonally Decoupled Variational Gaussian Processes , 2018, NeurIPS.

[44]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.