A new family of genetic algorithms for wavelength interval selection in multivariate analytical spectroscopy

A new procedure is presented for wavelength interval selection with a genetic algorithm in order to improve the predictive ability of partial least squares multivariate calibration. It involves separately labelling each of the selected sensor ranges with an appropriate inclusion ranking. The new approach intends to alleviate overfitting without the need of preparing an independent monitoring sample set. A theoretical example is worked out in order to compare the performance of the new approach with previous implementations of genetic algorithms. Two experimental data sets are also studied: target parameters are the concentration of glucuronic acid in complex mixtures studied by Fourier transform mid‐infrared spectroscopy and the octane number in gasolines monitored by near‐infrared spectroscopy. Copyright © 2003 John Wiley & Sons, Ltd.

[1]  S. Wold,et al.  Wavelength interval selection in multicomponent spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. , 2002, Analytical chemistry.

[2]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[3]  L. Buydens,et al.  Development of robust calibration models in near infra-red spectrometric applications , 2000 .

[4]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: Enhancement of Comparative Molecular Binding Energy Analysis by GA‐Based PLS Method , 1999 .

[5]  Alejandro C. Olivieri,et al.  Wavelength Selection for Multivariate Calibration Using a Genetic Algorithm: A Novel Initialization Strategy , 2002, J. Chem. Inf. Comput. Sci..

[6]  Michael J. McShane,et al.  Variable Selection in Multivariate Calibration of a Spectroscopic Glucose Sensor , 1997 .

[7]  John H. Kalivas,et al.  Graphical diagnostics for regression model determinations with consideration of the bias/variance trade-off , 2002 .

[8]  A. Salgó,et al.  Prediction of Gasoline Properties with near Infrared Spectroscopy , 1998 .

[9]  Riccardo Leardi,et al.  Application of genetic algorithm–PLS for feature selection in spectral data sets , 2000 .

[10]  Desire L. Massart,et al.  Feature selection in principal component analysis of analytical data , 2002 .

[11]  A. Olivieri,et al.  Simultaneous spectrophotometric-multivariate calibration determination of several components of ophthalmic solutions: phenylephrine, chloramphenicol, antipyrine, methylparaben and thimerosal. , 2000, Talanta.

[12]  M A Arnold,et al.  Genetic algorithm-based method for selecting wavelengths and model size for use with partial least-squares regression: application to near-infrared spectroscopy. , 1996, Analytical chemistry.

[13]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[14]  M. Thompson Selection of Variables in Multiple Regression: Part I. A Review and Evaluation , 1978 .

[15]  E. V. Thomas,et al.  Partial least-squares methods for spectral analyses. 1. Relation to other quantitative calibration methods and the extraction of qualitative information , 1988 .

[16]  Hyeseon Lee,et al.  Determination of Research Octane Number using NIR Spectral Data and Ridge Regression , 2001 .

[17]  Roberto Todeschini,et al.  Kohonen artificial neural networks as a tool for wavelength selection in multicomponent spectrofluorimetric PLS modelling: application to phenol, o-cresol, m-cresol and p-cresol mixtures , 1999 .

[18]  Gary W. Small,et al.  Learning optimization from nature : Genetic algorithms and simulated annealing , 1997 .

[19]  Alejandro C. Olivieri,et al.  Wavelength selection by net analyte signals calculated with multivariate factor-based hybrid linear analysis (HLA). A theoretical and experimental comparison with partial least-squares (PLS) , 1999 .

[20]  Bruce R. Kowalski,et al.  Propagation of measurement errors for the validation of predictions obtained by principal component regression and partial least squares , 1997 .

[21]  C. Spiegelman,et al.  Theoretical Justification of Wavelength Selection in PLS Calibration:  Development of a New Algorithm. , 1998, Analytical Chemistry.

[22]  J. Callis,et al.  Prediction of gasoline octane numbers from near-infrared spectral features in the range 660-1215 nm , 1989 .

[23]  R. Leardi Genetic algorithms in chemometrics and chemistry: a review , 2001 .

[24]  Paulo Augusto da Costa Filho,et al.  Aplicação de algoritmos genéticos na seleção de variáveis em espectroscopia no infravermelho médio: determinação simultânea de glicose, maltose e frutose , 2002 .

[25]  Gary W. Small,et al.  Peer Reviewed: Learning Optimization From Nature: Genetic Algorithms and Simulated Annealing , 1997 .

[26]  M A Arnold,et al.  Genetic algorithm-based wavelength selection for the near-infrared determination of glucose in biological matrixes: initialization strategies and effects of spectral resolution. , 1998, Analytical chemistry.

[27]  Douglas N. Rutledge,et al.  GENETIC ALGORITHM APPLIED TO THE SELECTION OF PRINCIPAL COMPONENTS , 1998 .

[28]  L. S. Feldt,et al.  THE SELECTION OF VARIABLES IN MULTIPLE REGRESSION ANALYSIS , 1970 .

[29]  M. Pintore,et al.  Molecular descriptor selection combining genetic algorithms and fuzzy logic: application to database mining procedures , 2002 .

[30]  F. Rius,et al.  Detection and correction of biased results of individual analytes in multicomponent spectroscopic analysis. , 1998, Analytical chemistry.

[31]  Douglas B. Kell,et al.  Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry , 1997 .