An efficient method of wavelength interval selection based on random frog for multivariate spectral calibration.

Wavelength selection is a critical step for producing better prediction performance when applied to spectral data. Considering the fact that the vibrational and rotational spectra have continuous features of spectral bands, we propose a novel method of wavelength interval selection based on random frog, called interval random frog (iRF). To obtain all the possible continuous intervals, spectra are first divided into intervals by moving window of a fix width over the whole spectra. These overlapping intervals are ranked applying random frog coupled with PLS and the optimal ones are chosen. This method has been applied to two near-infrared spectral datasets displaying higher efficiency in wavelength interval selection than others. The source code of iRF can be freely downloaded for academy research at the website: http://code.google.com/p/multivariate-calibration/downloads/list.

[1]  John H. Kalivas,et al.  Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry , 1989 .

[2]  A. G. Frenich,et al.  Wavelength selection method for multicomponent spectrophotometric determinations using partial least squares , 1995 .

[3]  Xueguang Shao,et al.  Application of latent projective graph in variable selection for near infrared spectral analysis , 2012 .

[4]  Jian-hui Jiang,et al.  Spectral regions selection to improve prediction ability of PLS models by changeable size moving window partial least squares and searching combination moving window partial least squares , 2004 .

[5]  M A Arnold,et al.  Genetic algorithm-based method for selecting wavelengths and model size for use with partial least-squares regression: application to near-infrared spectroscopy. , 1996, Analytical chemistry.

[6]  F. Ausubel Metabolomics , 2012, Nature Biotechnology.

[7]  Yukihiro Ozaki,et al.  Improvement of partial least squares models for in vitro and in vivo glucose quantifications by using near-infrared spectroscopy and searching combination moving window partial least squares , 2006 .

[8]  Avraham Lorber,et al.  The effect of interferences and calbiration design on accuracy: Implications for sensor and sample selection , 1988 .

[9]  Kimito Funatsu,et al.  Genetic algorithm‐based wavelength selection method for spectral calibration , 2011 .

[10]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[11]  Qing-Song Xu,et al.  Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification. , 2012, Analytica chimica acta.

[12]  Jiewen Zhao,et al.  Measurement of total flavone content in snow lotus (Saussurea involucrate) using near infrared spectroscopy combined with interval PLS and genetic algorithm. , 2010, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[13]  W. Cai,et al.  A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra , 2008 .

[14]  S. Wold,et al.  Wavelength interval selection in multicomponent spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. , 2002, Analytical chemistry.

[15]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[16]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[17]  Dong-Sheng Cao,et al.  Model-population analysis and its applications in chemical and biological modeling , 2012 .

[18]  Dong-Sheng Cao,et al.  Model population analysis for variable selection , 2010 .

[19]  Zou Xiaobo,et al.  Use of FT-NIR spectrometry in non-invasive measurements of soluble solid contents (SSC) of ‘Fuji’ apple based on different PLS models , 2007 .

[20]  D. Steinberg,et al.  Technometrics , 2008 .

[21]  Dong-Sheng Cao,et al.  Recipe for revealing informative metabolites based on model population analysis , 2010, Metabolomics.

[22]  Riccardo Leardi,et al.  Genetic-algorithm-based wavelength selection in multicomponent spectrophotometric determination by PLS: application on copper and zinc mixture. , 2003, Talanta.

[23]  Jerome J. Workman,et al.  Interpretive Spectroscopy for Near Infrared , 1996 .

[24]  J. Chalmers,et al.  Handbook of vibrational spectroscopy , 2002 .

[25]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[26]  Israel Schechter,et al.  Wavelength Selection for Simultaneous Spectroscopic Analysis. Experimental and Theoretical Study , 1996 .

[27]  Z. Jiang,et al.  Infrared Spectroscopy , 2022 .

[28]  Hongdong Li,et al.  Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. , 2009, Analytica chimica acta.

[29]  Desire L. Massart,et al.  Comparison of multivariate methods based on latent vectors and methods based on wavelength selection for the analysis of near-infrared spectroscopic data , 1995 .

[30]  C. B. Lucasius,et al.  Genetic algorithms in wavelength selection: a comparative study , 1994 .

[31]  R. Leardi,et al.  Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions , 2004 .

[32]  Satoshi Kawata,et al.  Optimal Wavelength Selection for Quantitative Analysis , 1986 .

[33]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .

[34]  Dong-Sheng Cao,et al.  A new strategy of outlier detection for QSAR/QSPR , 2009, J. Comput. Chem..

[35]  M. Forina,et al.  Multivariate calibration. , 2007, Journal of chromatography. A.

[36]  Roman M. Balabin,et al.  Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. , 2011, Analytica chimica acta.

[37]  S. Engelsen,et al.  Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy , 2000 .

[38]  许旱峤,et al.  Kirk-Othmer Encyclopedia of Chemical Technology数据库介绍及实例 , 2007 .

[39]  Qing-Song Xu,et al.  PLS regression based on sure independence screening for multivariate calibration , 2012 .

[40]  C. Spiegelman,et al.  Theoretical Justification of Wavelength Selection in PLS Calibration:  Development of a New Algorithm. , 1998, Analytical Chemistry.

[41]  Kaiyi Zheng,et al.  Stability competitive adaptive reweighted sampling (SCARS) and its applications to multivariate calibration of NIR spectra , 2012 .

[42]  Xueguang Shao,et al.  A wavelength selection method based on randomization test for near-infrared spectral analysis , 2009 .

[43]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[44]  Zou Xiaobo,et al.  Variables selection methods in near-infrared spectroscopy. , 2010, Analytica chimica acta.

[45]  R. Yu,et al.  An ensemble of Monte Carlo uninformative variable elimination for wavelength selection. , 2008, Analytica chimica acta.

[46]  J. Kalivas Two data sets of near infrared spectra , 1997 .

[47]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[48]  Riccardo Leardi,et al.  Application of genetic algorithm–PLS for feature selection in spectral data sets , 2000 .

[49]  Andrew G. Glen,et al.  APPL , 2001 .

[50]  Xueguang Shao,et al.  Multivariate calibration of near-infrared spectra by using influential variables , 2012 .

[51]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[52]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.