A simple idea on applying large regression coefficient to improve the genetic algorithm-PLS for variable selection in multivariate calibration

Abstract Genetic algorithm-based couple with partial least squares (PLS) has been successfully applied for variable selection in multivariate calibration. On the basis of the fact that a large PLS regression coefficient indicates an important variable, a new and simple idea that the structure of a proportion of chromosomes in the initial population is determined by the large regression coefficient is presented in this study. The regression coefficient is obtained by establishing the PLS modeling on the autoscaled data. With this improved approach, the modified GA-PLS method not only makes the optimization better toward the optimal solution, but also obeys the rule of the GAs. The results obtained through investigating one simulated dataset and two near infrared dataset show that the modified method has made much improvement on variable selection compared to the original GA-PLS.

[1]  J. Kalivas Two data sets of near infrared spectra , 1997 .

[2]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[3]  Douglas B. Kell,et al.  Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry , 1997 .

[4]  R. Leardi Genetic algorithms in chemometrics and chemistry: a review , 2001 .

[5]  M A Arnold,et al.  Genetic algorithm-based wavelength selection for the near-infrared determination of glucose in biological matrixes: initialization strategies and effects of spectral resolution. , 1998, Analytical chemistry.

[6]  M. Cronin,et al.  Application of the modelling power approach to variable subset selection for GA-PLS QSAR models. , 2008, Analytica chimica acta.

[7]  Israel Schechter,et al.  Wavelength Selection for Simultaneous Spectroscopic Analysis. Experimental and Theoretical Study , 1996 .

[8]  R. Leardi,et al.  Genetic algorithms applied to feature selection in PLS regression: how and when to use them , 1998 .

[9]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[10]  Zou Xiaobo,et al.  Variables selection methods in near-infrared spectroscopy. , 2010, Analytica chimica acta.

[11]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[12]  Riccardo Leardi,et al.  Application of genetic algorithm–PLS for feature selection in spectral data sets , 2000 .

[13]  P. Legendre,et al.  Forward selection of explanatory variables. , 2008, Ecology.

[14]  W. Cai,et al.  A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra , 2008 .

[15]  Riccardo Leardi,et al.  Genetic Algorithms as a Tool for Wavelength Selection in Multivariate Calibration , 1995 .

[16]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[17]  C. Spiegelman,et al.  Theoretical Justification of Wavelength Selection in PLS Calibration:  Development of a New Algorithm. , 1998, Analytical Chemistry.

[18]  Desire L. Massart,et al.  Comparison of multivariate methods based on latent vectors and methods based on wavelength selection for the analysis of near-infrared spectroscopic data , 1995 .

[19]  Qing-Song Xu,et al.  Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification. , 2012, Analytica chimica acta.

[20]  M. Forina,et al.  Transfer of calibration function in near-infrared spectroscopy , 1995 .

[21]  C. B. Lucasius,et al.  Genetic algorithms in wavelength selection: a comparative study , 1994 .

[22]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[23]  Riccardo Leardi,et al.  Genetic algorithms in chemometrics , 2012 .

[24]  Tahir Mehmood,et al.  A review of variable selection methods in Partial Least Squares Regression , 2012 .

[25]  Mohammad Reza Ganjali,et al.  Application of GA-MLR, GA-PLS and the DFT quantum mechanical (QM) calculations for the prediction of the selectivity coefficients of a histamine-selective electrode , 2008 .

[26]  Xueguang Shao,et al.  Application of latent projective graph in variable selection for near infrared spectral analysis , 2012 .

[27]  M A Arnold,et al.  Genetic algorithm-based method for selecting wavelengths and model size for use with partial least-squares regression: application to near-infrared spectroscopy. , 1996, Analytical chemistry.

[28]  V. Bellon-Maurel,et al.  Using Genetic Algorithms to Select Wavelengths in Near-Infrared Spectra: Application to Sugar Content Prediction in Cherries , 2000 .

[29]  X. Wang,et al.  Monitoring Batch Cooling Crystallization Using NIR: Development of Calibration Models Using Genetic Algorithm and PLS , 2008 .

[30]  Xueguang Shao,et al.  Multivariate calibration of near-infrared spectra by using influential variables , 2012 .

[31]  Kaiyi Zheng,et al.  Stability competitive adaptive reweighted sampling (SCARS) and its applications to multivariate calibration of NIR spectra , 2012 .

[32]  Dong-Sheng Cao,et al.  An efficient method of wavelength interval selection based on random frog for multivariate spectral calibration. , 2013, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[33]  Kimito Funatsu,et al.  Genetic algorithm‐based wavelength selection method for spectral calibration , 2011 .

[34]  Hongdong Li,et al.  Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. , 2009, Analytica chimica acta.

[35]  Riccardo Leardi,et al.  Extraction of representative subsets by potential functions method and genetic algorithms , 1998 .

[36]  John H. Kalivas,et al.  Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry , 1989 .

[37]  A. G. Frenich,et al.  Wavelength selection method for multicomponent spectrophotometric determinations using partial least squares , 1995 .

[38]  M. C. Ortiz,et al.  Genetic-algorithm-based potential selection in multivariant voltammetric determination of indomethacin and acemethacin by partial least squares , 1998 .

[39]  U Depczynski,et al.  Genetic algorithms applied to the selection of factors in principal component regression , 2000 .

[40]  David Riaño,et al.  Retrieval of Fresh Leaf Fuel Moisture Content Using Genetic Algorithm Partial Least Squares (GA-PLS) Modeling , 2007, IEEE Geoscience and Remote Sensing Letters.

[41]  John H. Kalivas,et al.  Comparison of Forward Selection, Backward Elimination, and Generalized Simulated Annealing for Variable Selection , 1993 .