Genetic algorithms applied to the selection of factors in principal component regression

Abstract Using principal component regression (PCR) as a multivariate calibration tool, always brings up the question what subset of factors, i.e. principal components (PCs) gives the best calibration model. Normally factor selection is based on deterministic methods like top–down procedures, forward–backward-stepwise variable selection or correlated principal component regression (CPCR). In contrast to this, we applied a stochastic method, i.e. a genetic algorithm (GA) for factor selection in this paper. A new kind of fitness function was applied which combined the prediction error of the calibration and an independent validation set. The performance of eigenvalue and correlation ranking was compared. A general statistical criterion for judging the significance of differences between individual calibration models is introduced. In this context it could be shown that for the uncertainties of the standard deviations representing the prediction errors a very simple approximation formula holds which only includes the number of standards. For the current applications it is shown that the GA gives a result very close to CPCR-solutions.

[1]  Klaas Faber,et al.  Critical evaluation of two F-tests for selecting the number of factors in abstract factor analysis , 1997 .

[2]  D. B. Hibbert Genetic algorithms in chemistry , 1993 .

[3]  Edmund R. Malinowski,et al.  Determination of the number of factors and the experimental error in a data matrix , 1977 .

[4]  R. Leardi,et al.  Genetic algorithms applied to feature selection in PLS regression: how and when to use them , 1998 .

[5]  J. Kalivas Two data sets of near infrared spectra , 1997 .

[6]  Yu-Long Xie,et al.  Evaluation of principal component selection methods to form a global prediction model by principal component regression , 1997 .

[7]  John H. Kalivas,et al.  Which principal components to utilize for principal component regression , 1992 .

[8]  Nostrand Reinhold,et al.  the utility of using the genetic algorithm approach on the problem of Davis, L. (1991), Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York. , 1991 .

[9]  Near infrared spectroscopy analysis of intact pharmaceutical diclofenac coated tablets in transmission , 1999 .

[10]  Douglas N. Rutledge,et al.  GENETIC ALGORITHM APPLIED TO THE SELECTION OF PRINCIPAL COMPONENTS , 1998 .

[11]  Karl Molt,et al.  Use of a Genetic Algorithm for Factor Selection in Principal Component Regression , 1998 .

[12]  Jianguo Sun,et al.  A correlation principal component regression analysis of NIR data , 1995 .

[13]  John H. Kalivas,et al.  Comparison of Forward Selection, Backward Elimination, and Generalized Simulated Annealing for Variable Selection , 1993 .

[14]  Desire L. Massart,et al.  Random correlation in variable selection for multivariate calibration with a genetic algorithm , 1996 .

[15]  Nicolaas M. Faber,et al.  Estimating the uncertainty in estimates of root mean square error of prediction: application to determining the size of an adequate test set in multivariate calibration , 1999 .