Dynamic determination of the dimension of PCA calibration models using F‐statistics

Owing to experimental measurement errors, determination of the proper dimension of calibration models is difficult. Cross‐validation is a common approach for this purpose; however, if data evaluation is based on PCA only without consideration of sample concentrations, this computationally expensive method cannot be applied. In this study a statistical method for determining the proper dimension of PCA calibration models is presented from the viewpoint of multivariate regression analysis considering only measured data. For this iterative algorithm, individual principal components are included stepwise in a reduced model, which is subsequently tested against the full model including all principal components. This algorithm can be individually applied for optimized data evaluation to every measured data vector such as an optical spectrum of chemical analyte. This comparison is performed by an F‐test comparing estimates of residual variance of a measurement spectrum determined from the reduced and the full model. This approach determines a lack of fit due to insufficient principal components. If no lack of fit is evident for a certain reduced model, it is considered that a sufficiently large model has been found and inclusion of additional principal components is stopped. Hence the resulting reduced calibration model includes only statistically significant principal components (PCs) and determines the minimum number of required PCs for a given measurement spectrum. The proposed algorithm is initially investigated using simulated data and subsequently applied to three different experimental sets of spectra. It is shown that for synthetic data at reasonable noise levels the correct number of principal components can be determined in most cases. The experimental examples demonstrate that the number of principal components determined by the proposed algorithm is slightly larger than a user would select manually by subjective visual inspection. As one result, the algorithm is able to detect small but significant spectroscopic features of experimental data which would otherwise be neglected. Copyright © 2003 John Wiley & Sons, Ltd.

[1]  Klaas Faber,et al.  Critical evaluation of two F-tests for selecting the number of factors in abstract factor analysis , 1997 .

[2]  Edmund R. Malinowski,et al.  Theory of error in factor analysis , 1977 .

[3]  Note on a modified Faber–Kowalski F‐test for abstract factor analysis , 2000 .

[4]  H. M. Heise,et al.  Calibration modeling by partial least-squares and principal component regression and its optimization using an improved leverage correction for prediction testing , 1990 .

[5]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[6]  F. Vogt,et al.  Optical UV Derivative Spectroscopy for Monitoring Gaseous Emissions , 1999 .

[7]  Bruce R. Kowalski,et al.  Modification of Malinowski's F‐test for abstract factor analysis applied to the Quail Roost II data sets , 1997 .

[8]  Frank Vogt,et al.  A UV spectroscopic method for monitoring aromatic hydrocarbons dissolved in water , 2000 .

[9]  Frank Vogt,et al.  Erratum to “A UV spectroscopic method for monitoring aromatic hydrocarbons dissolved in water” [Analytica Chimica Acta 422 (2000) 187–198] , 2001 .

[10]  B. Mizaikoff,et al.  First Results on Infrared Attenuated Total Reflection Spectroscopy for Quantitative Analysis of Salt Ions in Seawater , 2002 .

[11]  Mikael Kubista,et al.  An automated procedure to predict the number of components in spectroscopic data , 1999 .

[12]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[13]  N. J. Harrick,et al.  Internal reflection spectroscopy , 1968 .

[14]  Frank Vogt,et al.  Fast principal component analysis of large data sets based on information extraction , 2002 .

[15]  The automated sample preparation system MixMaster for investigation of volatile organic compounds with mid-infrared evanescent wave spectroscopy. , 2003, The Analyst.

[16]  E. V. Thomas,et al.  Partial least-squares methods for spectral analyses. 1. Relation to other quantitative calibration methods and the extraction of qualitative information , 1988 .

[17]  L. G. Blackwood Factor Analysis in Chemistry (2nd Ed.) , 1994 .

[18]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[19]  Edmund R. Malinowski,et al.  Abstract factor analysis of data with multiple sources of error and a modified Faber–Kowalski f‐test † , 1999 .

[20]  Edmund R. Malinowski,et al.  Determination of the number of factors and the experimental error in a data matrix , 1977 .

[21]  Frank Vogt,et al.  Fast principal component analysis of large data sets , 2001 .