Gaussian Process Regression

Gaussian processes Suppose we have a random variable y, the protein content of flour samples for example, which depends on another variable x, such as the NIR spectra of the samples. We show this dependence by writing y as a function of x, y(x). Now suppose we have n samples with spectra x1, ..., xn. Then we might be willing to assume that the joint distribution of y(x1), ..., y(xn) is multivariate normal (Gaussian). If this is true for any n, i.e. for any set of samples we might observe now or in the future, then the distribution of y is said to be a Gaussian process (GP). To specify a multivariate normal distribution we need a mean vector and a covariance matrix. We can simplify things by centring all our data and assuming a mean of zero, in which case the joint distribution is defined if we specify the covariance matrix C. This will in general depend on the values of x1, ..., xn, and it is the specification of the form of this dependence that lies at the core of Gaussian process regression (GPR).