Nonstationary Gaussian Process Regression using a Latent Extension of the Input Space

Robert Bosch GmbH, Corporate Research and Advance Engineering, Stuttgart Max Planck Institute for Biological Cybernetics, Tubingen {Tobias.Pfingsten, Malte.Kuss, Carl}@tuebingen.mpg.de Introduction Gaussian Processes (GPs) can be used to specify a prior over latent functions in non-parametric Bayesian models, e.g. for regression and classification. For this abstract we assume familiarity with the basic concepts of Gaussian Process models, see for example the introduction by Mackay [1]. A GP is defined by a mean and a covariance function, the latter describing dependencies k(x,x′) = cov(f(x), f(x′)) between function values as a function of the corresponding inputs x and x′. A common assumption when specifying a GP prior is stationarity, i.e. that the covariance between function values only depends on the distances |x − x′|, not on their location. It is far more difficult to specify a GP prior allowing the function to have different properties in different parts of the input space. In this work we describe new techniques for non-parametric Bayesian regression for, e.g. discontinuous, functions where the stationarity assumption does not hold. Several approaches to the problem of how to specify nonstationary GP models can be found in the literature. Sampson and Guttorp [2] propose to use multidimensional scaling for spatio-temporal Processes to map a nonstationary spatial Process into a latent space in which the problem becomes approximately stationary. Schmidt and O’Hagan [3] pick up the idea and use GPs to implement the mapping. In comparison to a direct definition of a nonstationary covariance function, as proposed by [4], the detour via a latent space is advantageous because it assures positive definiteness of the covariance between observations in the original space and eases an intuitive interpretation of the problem. In this work we propose to augment the input space R by a latent extra input which we infer from the data. When thinking of regression for discontinuous functions, the extra input could tear apart regions of the input space that are separated by abrupt changes of the function values. The idea to add an extra dimension to the input space is strongly related to the use of a so-called Mixtures of Local Experts (MoE) as described in [5, 6, 7, 8] where several independent GPs, so called experts, are used to explain the data in different regions of the input space. In this framework a gating network assigns responsibilities to certain experts, defining a mapping from the known inputs x to the class associations. We close the gap between a mixture of independent experts and a single GP using the fact that the latent associations to the experts can be seen as a discretized latent input. In the following we present two approaches for approximate Bayesian inference in GP models, that implement nonstationarity by an augmented input space. The first method is inspired by the MoE view with a discrete latent input and is implemented in an MCMC sampling scheme, whereas the second method estimates a continuous latent mapping by evidence maximization. Nonstationarity by Augmentation Let D = {(x1, y1), . . . , (xN , yN )} denote N training samples, where yi ∈ R stand for a target and xi ∈ R is the corresponding Ddimensional input vector. The standard GP regression model assumes a relation yi = f(xi) + e via a latent function f , where the observational noise is normally distributed. The key idea is to use a Gaussian Process prior on f and to make inference about the latent function directly. Below, all parameters of the covariance function and the likelihood are collected in a vector θ. Assume for example we use the common quadratic exponential covariance function k(x,x′) = v exp{− 12 ∑ d w −2 d (xd − xd)} and extend the inputs by a latent variable `. The covariance function of a GP in this augmented input space x = (x, `) reads k(x, x′) = k(x,x′) exp ( − 12 ( `−`′ wo )2) . (1)