Robustness of PARAFAC and N-PLS regression models in relation to homoscedastic and heteroscedastic noise

Abstract In this study, the robustness of the parallel factor analysis (PARAFAC) and N -way partial least squares ( N -PLS) regression models were investigated in relation to homoscedastic and heteroscedastic noise (fliker noise) for the data, Claus, with the simulated noise. The Claus data, loaded from the N -way toolbox for MATLAB [C.A. Andersson, R. Bro. The N -way Toolbox for MATAB Chemom. Intell. Lab. Sys. 52 (2000) 1.]. The data consisted of five samples, 201 emission wavelengths and 61 excitation wavelengths. The simulated homoscedastic and heteroscedastic noise were added to the original data and the predictive ability of the models was studied. The results showed that the data and the models are robust with respect to these types of noise (without correlation). One of the reasons for robustness of the models might be attributed to the large number of the data points in the original data. This possibility was examined by constructing three-way arrays from the original data with the lower dimensions in the excitation wavelengths. Three-way arrays were created with dimensions of 5 × 201 × 31, 5 × 201 × 16, 5 × 201 × 8 and 5 × 201 × 4. The performance of each model was evaluated by calculating the root mean squared errors of cross validation (RMSECV) for the analytes using the leave one sample out method. The results of the N -PLS models showed that the RMSECV values were enhanced by decreasing the dimensions of both the original data and the same data with the simulated noise. However, the RMSECV changes for the noisy data are much larger than the original data. The results of N -PLS models in different three-way arrays with or without noise are better than the PARAFAC models.