论文信息 - Sparse Gaussian process approximations and applications

Sparse Gaussian process approximations and applications

Many tasks in machine learning require learning some kind of input-output relation (function), for example, recognising handwritten digits (from image to number) or learning the motion behaviour of a dynamical system like a pendulum (from positions and velocities now to future positions and velocities). We consider this problem using the Bayesian framework, where we use probability distributions to represent the state of uncertainty that a learning agent is in. In particular, we will investigate methods which use Gaussian processes to represent distributions over functions. Gaussian process models require approximations in order to be practically useful. This thesis focuses on understanding existing approximations and investigating new ones tailored to specific applications. We advance the understanding of existing techniques first through a thorough review. We propose desiderata for non-parametric basis function model approximations, which we use to assess the existing approximations. Following this, we perform an in-depth empirical investigation of two popular approximations (VFE and FITC). Based on the insights gained, we propose a new inter-domain Gaussian process approximation, which can be used to increase the sparsity of the approximation, in comparison to regular inducing point approximations. This allows GP models to be stored and communicated more compactly. Next, we show that inter-domain approximations can also allow the use of models which would otherwise be impractical, as opposed to improving existing approximations. We introduce an inter-domain approximation for the Convolutional Gaussian process – a model that makes Gaussian processes suitable to image inputs, and which has strong relations to convolutional neural networks. This same technique is valuable for approximating Gaussian processes with more general invariance properties. Finally, we revisit the derivation of the Gaussian process State Space Model, and discuss some subtleties relating to their approximation. We hope that this thesis illustrates some benefits of non-parametric models and their approximation in a non-parametric fashion, and that it provides models and approximations that prove to be useful for the development of more complex and performant models in the future.

Mark van der Wilk

[1] Matthias W. Seeger,et al. Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[2] Carl E. Rasmussen,et al. Convolutional Gaussian Processes , 2017, NIPS.

[3] Arno Solin,et al. Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..

[4] Carl E. Rasmussen,et al. Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[5] Maurizio Filippone,et al. AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models , 2016, UAI.

[6] Zoubin Ghahramani,et al. Probabilistic machine learning and artificial intelligence , 2015, Nature.

[7] A. Conv. A Kronecker-factored approximate Fisher matrix for convolution layers , 2016 .

[8] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.

[9] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[10] I. Kondor,et al. Group theoretical methods in machine learning , 2008 .

[11] Richard E. Turner,et al. Learning Stationary Time Series using Gaussian Processes with Nonparametric Kernels , 2015, NIPS.