Data Imputation and Robust Training with Gaussian Processes

When training a regression model from observations, it is often assumed that only the outputs are noisy. When the inputs are also known (or suspected) to be corrupted, the challenge is to account for this uncertainty properly. In all but the simplest of models, integrating out the inputs is intractable, even if the true input distribution were known. We present an approach for Gaussian Process machines which simultaneously accounts for data uncertainty, thus improving future predictions, and estimates the true values of the noisy inputs. Our algorithm is based on lower bounding the true marginal likelihood, and takes the form of an expectation-maximization procedure, alternately updating model parameters and adjusting estimates of cleaned input points.