A new and efficient variable selection algorithm based on ant colony optimization. Applications to near infrared spectroscopy/partial least-squares analysis.

A new variable selection algorithm is described, based on ant colony optimization (ACO). The algorithm aim is to choose, from a large number of available spectral wavelengths, those relevant to the estimation of analyte concentrations or sample properties when spectroscopic analysis is combined with multivariate calibration techniques such as partial least-squares (PLS) regression. The new algorithm employs the concept of cooperative pheromone accumulation, which is typical of ACO selection methods, and optimizes PLS models using a pre-defined number of variables, employing a Monte Carlo approach to discard irrelevant sensors. The performance has been tested on a simulated system, where it shows a significant superiority over other commonly employed selection methods, such as genetic algorithms. Several near infrared spectroscopic experimental data sets have been subjected to the present ACO algorithm, with PLS leading to improved analytical figures of merit upon wavelength selection. The method could be helpful in other chemometric activities such as classification or quantitative structure-activity relationship (QSAR) problems.

[1]  John F. MacGregor,et al.  Interpretation of regression coefficients under a latent variable regression model , 2001 .

[2]  R. Leardi,et al.  Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions , 2004 .

[3]  Alejandro C. Olivieri,et al.  Visible/near infrared-partial least-squares analysis of Brix in sugar cane juice: A test field for variable selection methods , 2010 .

[4]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: Enhancement of Comparative Molecular Binding Energy Analysis by GA‐Based PLS Method , 1999 .

[5]  Ludovic Duponchel,et al.  Parallel genetic algorithm co-optimization of spectral pre-processing and wavelength selection for PLS regression , 2011 .

[6]  Thomas Stützle,et al.  Ant Colony Optimization , 2009, EMO.

[7]  Marco Dorigo,et al.  Optimization, Learning and Natural Algorithms , 1992 .

[8]  Jian-Hui Jiang,et al.  Modified Ant Colony Optimization Algorithm for Variable Selection in QSAR Modeling: QSAR Studies of Cyclooxygenase Inhibitors , 2005, J. Chem. Inf. Model..

[9]  Christopher D. Brown,et al.  Critical factors limiting the interpretation of regression vectors in multivariate calibration , 2009 .

[10]  Zou Xiaobo,et al.  Variables selection methods in near-infrared spectroscopy. , 2010, Analytica chimica acta.

[11]  R. Leardi,et al.  Genetic algorithms applied to feature selection in PLS regression: how and when to use them , 1998 .

[12]  Douglas B. Kell,et al.  Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry , 1997 .

[13]  J. Curcio,et al.  Near infrared absorption spectrum of liquid water , 1951 .

[14]  Jean-Pierre Gauchi,et al.  Comparison of selection methods of explanatory variables in PLS regression with application to manufacturing process data , 2001 .

[15]  S. Engelsen,et al.  Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy , 2000 .

[16]  Richard Jensen,et al.  Feature Selection and Linear/Nonlinear Regression Methods for the Accurate Prediction of Glycogen Synthase Kinase-3β Inhibitory Activities , 2009, J. Chem. Inf. Model..

[17]  Age K. Smilde,et al.  Direct orthogonal signal correction , 2001 .

[18]  Bahram Hemmateenejad,et al.  Ant colony optimisation: a powerful tool for wavelength selection , 2006 .

[19]  C. Jun,et al.  Performance of some variable selection methods when multicollinearity is present , 2005 .

[20]  H. Siesler,et al.  Near-infrared spectroscopy:principles,instruments,applications , 2002 .

[21]  Alejandro C. Olivieri,et al.  Wavelength Selection for Multivariate Calibration Using a Genetic Algorithm: A Novel Initialization Strategy , 2002, J. Chem. Inf. Comput. Sci..

[22]  Jian-hui Jiang,et al.  Spectral regions selection to improve prediction ability of PLS models by changeable size moving window partial least squares and searching combination moving window partial least squares , 2004 .

[23]  M A Arnold,et al.  Genetic algorithm-based method for selecting wavelengths and model size for use with partial least-squares regression: application to near-infrared spectroscopy. , 1996, Analytical chemistry.

[24]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[25]  Paul J. Gemperline,et al.  Nonlinear multivariate calibration using principal components regression and artificial neural networks , 1991 .

[26]  M A Arnold,et al.  Genetic algorithm-based wavelength selection for the near-infrared determination of glucose in biological matrixes: initialization strategies and effects of spectral resolution. , 1998, Analytical chemistry.

[27]  Susan L. Rose-Pehrsson,et al.  Automated wavelength selection for spectroscopic fuel models by symmetrically contracting repeated unmoving window partial least squares , 2008 .

[28]  C. Boschetti,et al.  A New Genetic Algorithm Applied to the near Infrared Analysis of Gasolines , 2004 .

[29]  Olav M. Kvalheim,et al.  Interpretation of latent-variable regression models , 1989 .

[30]  Hyeseon Lee,et al.  Determination of Research Octane Number using NIR Spectral Data and Ridge Regression , 2001 .

[31]  Alejandro C. Olivieri,et al.  Wavelength selection by net analyte signals calculated with multivariate factor-based hybrid linear analysis (HLA). A theoretical and experimental comparison with partial least-squares (PLS) , 1999 .

[32]  Bruce R. Kowalski,et al.  Qualitative Information from Multivariate Calibration Models , 1990 .

[33]  Bahram Hemmateenejad,et al.  An efficient variable selection method based on the use of external memory in ant colony optimization. Application to QSAR/QSPR studies. , 2009, Analytica chimica acta.

[34]  Alejandro C. Olivieri,et al.  A new family of genetic algorithms for wavelength interval selection in multivariate analytical spectroscopy , 2003 .

[35]  R. Leardi,et al.  Variable selection for multivariate calibration using a genetic algorithm: prediction of additive concentrations in polymer films from Fourier transform-infrared spectral data , 2002 .

[36]  C. Spiegelman,et al.  Theoretical Justification of Wavelength Selection in PLS Calibration:  Development of a New Algorithm. , 1998, Analytical Chemistry.

[37]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[38]  Richard Jensen,et al.  Ant colony optimization as a feature selection method in the QSAR modeling of anti-HIV-1 activities of 3-(3,5-dimethylbenzyl)uracil derivatives using MLR, PLS and SVM regressions , 2009 .