Gaussian process priors with ARMA noise models

We extend the standard covariance function used in the Gaussian Process prior nonparametric modelling approach to include correlated (ARMA) noise models. The improvement in performance is illustrated on some simulation examples of data generated by nonlinear static functions corrupted with additive ARMA noise. 1 Gaussian Process priors In recent years many flexible parametric and semi-parametric approaches to empirical identification of nonlinear systems have been used. In this paper we use nonparametric models which retain the available data and perform inference conditional on the current state and local data (called ‘smoothing’ in some frameworks). This direct use of the data has potential advantages in many control contexts. The uncertainty of model predictions can be made dependent on local data density, and the model complexity is automatically related to the amount of available data (more complex models need more evidence to make them likely). The nonparametric model used in this paper is a Gaussian Process prior, as developed by O’Hagan [1] and reviewed in [2, 3]. An application to modelling a system within a control context is described in [4], and further developments relating to their use in gain scheduling are described in [5]. Most previous published work has focused on regression tasks with independent identically distributed noise characteristics. Input-dependent noise is described in [6], but we are not aware of previous work with coloured noise covariance functions in Gaussian Process priors. This paper shows how knowledge about correlation structure of additive unmeasured noise or disturbances can be incorporated into the model. This improves the performance of the model in finding optimal parameters for describing the deterministic aspects of the system, and can be used to make online prediction more accurately. We expect this will make the use of Gaussian Process priors more attractive for use in control and signal processing contexts. 2 Modelling with GPs We assume that we are modelling an unknown nonlinear system f(x), with known inputs x, using observed outputs y. These have been corrupted by an additive discrete-time process (t). Here we assume that f(xi) and i are independent. Let y = [y1; : : : ; yN ℄T , a set of observed data or targets be such that yi = f(xi) + i ; i = 1; : : : n (1) 2.1 The Gaussian Process prior approach A prior is placed directly on the space of functions for modelling the above system. We assume that the values of the function f(x) at inputs x1; : : : ; xn, outputs y1; : : : ; yn, constitute a set of random variables which we assume will have a joint n-dimensional multivariate Normal distribution. The Gaussian Process is then fully specified by its mean and covariance function C(xi; xj). We note (y1; : : : ; yn)T N (0; ); (2) where is the covariance matrix whose entries ij are given by C(xi; xj). We now have a prior distribution for the target values which is a multivariate Normal: p(yjx) = (2 ) n2 j j 1 2 exp 12yT 1y ; (3) 1See a standard text such as [7] for a discussion of disturbance models in the linear system identification context. 2In what follows, we assume a zero mean process.