Data augmentation: an alternative approach to the analysis of spectroscopic data

Abstract The need for inferential models capable of accurately predicting product qualities has never been greater than it is today in the chemical process industry. However, due to production limitations and the need to reduce costs, obtaining sufficient relevant data to enable accurate and robust calibration models to be derived is a major challenge. This is due to the intrinsic sparsity of the process data resulting from the small number of objects available, e.g., batches, compared with the large number of process variables (wavelengths) measured. This paper examines a method of applying Partial Least Squares (PLS) to a database, which has been enhanced through the addition of Gaussian noise to the original data, for the development of a robust calibration model. The addition of Gaussian noise to the process variables alone has been shown to lead to a decrease in the error of the predictor as a consequence of the increase in the data density.