A novel algorithm for spectral interval combination optimization.

In this study, a new wavelength interval selection algorithm named as interval combination optimization (ICO) was proposed under the framework of model population analysis (MPA). In this method, the full spectra are divided into a fixed number of equal-width intervals firstly. Then the optimal interval combination is searched iteratively under the guide of MPA in a soft shrinkage manner, among which weighted bootstrap sampling (WBS) is employed as random sampling method. Finally, local search is conducted to optimize the widths of selected intervals. Three NIR datasets were used to validate the performance of ICO algorithm. Results show that ICO can select fewer wavelengths with better prediction performance when compared with other four wavelength selection methods, including VISSA, VISSA-iPLS, iVISSA and GA-iPLS. In addition, the computational intensity of ICO is also economical, benefit from fewer tune parameters and faster convergence speed.

[1]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[2]  W. Cai,et al.  A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra , 2008 .

[3]  Testing of a simplified LED based vis/NIR system for rapid ripeness evaluation of white grape (Vitis vinifera L.) for Franciacorta wine. , 2015, Talanta.

[4]  Roger W. Johnson,et al.  An Introduction to the Bootstrap , 2001 .

[5]  Bahram Hemmateenejad,et al.  Ant colony optimisation: a powerful tool for wavelength selection , 2006 .

[6]  R. Teófilo,et al.  Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression , 2009 .

[7]  Age K. Smilde,et al.  Variable importance in latent variable regression models , 2014 .

[8]  Tahir Mehmood,et al.  A review of variable selection methods in Partial Least Squares Regression , 2012 .

[9]  Å. Rinnan,et al.  Application of near infrared reflectance (NIR) and fluorescence spectroscopy to analysis of microbiological and chemical properties of arctic soil , 2007 .

[10]  Paul Geladi,et al.  Interactive variable selection (IVS) for PLS. Part 1: Theory and algorithms , 1994 .

[11]  Qing-Song Xu,et al.  Using variable combination population analysis for variable selection in multivariate calibration. , 2015, Analytica chimica acta.

[12]  Lunzhao Yi,et al.  A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling. , 2014, The Analyst.

[13]  Alejandro C. Olivieri,et al.  A new family of genetic algorithms for wavelength interval selection in multivariate analytical spectroscopy , 2003 .

[14]  John H. Kalivas,et al.  Simulated‐annealing‐based optimization algorithms: Fundamentals and wavelength selection applications , 1995 .

[15]  Dong-Sheng Cao,et al.  A bootstrapping soft shrinkage approach for variable selection in chemical modeling. , 2016, Analytica chimica acta.

[16]  Qingsong Xu,et al.  Elastic Net Grouping Variable Selection Combined with Partial Least Squares Regression (EN-PLSR) for the Analysis of Strongly Multi-Collinear Spectroscopic Data , 2011, Applied spectroscopy.

[17]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[18]  S. Engelsen,et al.  Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy , 2000 .

[19]  Zou Xiaobo,et al.  Variables selection methods in near-infrared spectroscopy. , 2010, Analytica chimica acta.

[20]  P. Bertail,et al.  The Weighted Bootstrap , 1995 .

[21]  Zou Xiaobo,et al.  Use of FT-NIR spectrometry in non-invasive measurements of soluble solid contents (SSC) of ‘Fuji’ apple based on different PLS models , 2007 .

[22]  John H. Kalivas,et al.  Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry , 1989 .

[23]  Franco Allegrini,et al.  A new and efficient variable selection algorithm based on ant colony optimization. Applications to near infrared spectroscopy/partial least-squares analysis. , 2011, Analytica chimica acta.

[24]  Rupert G. Miller The jackknife-a review , 1974 .

[25]  Dong-Sheng Cao,et al.  Model population analysis for variable selection , 2010 .

[26]  Yi-Zeng Liang,et al.  Iteratively variable subset optimization for multivariate calibration , 2015 .

[27]  E. Ben-Dor The reflectance spectra of organic matter in the visible near-infrared and short wave infrared region (400-2500 nm) during a controlled decomposition process , 1997 .

[28]  L. Buydens,et al.  Predictive-property-ranked variable reduction in partial least squares modelling with final complexity adapted models: comparison of properties for ranking. , 2013, Analytica chimica acta.

[29]  Dong-Sheng Cao,et al.  A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration. , 2014, Analytica chimica acta.

[30]  Dong-Sheng Cao,et al.  An efficient method of wavelength interval selection based on random frog for multivariate spectral calibration. , 2013, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[31]  Paul Geladi,et al.  Interactive variable selection (IVS) for PLS. Part II: Chemical applications , 1995 .

[32]  Kimito Funatsu,et al.  Genetic algorithm‐based wavelength selection method for spectral calibration , 2011 .

[33]  R. Leardi,et al.  Genetic algorithms applied to feature selection in PLS regression: how and when to use them , 1998 .

[34]  Roberto Kawakami Harrop Galvão,et al.  A method for calibration and validation subset partitioning. , 2005, Talanta.

[35]  R. Leardi,et al.  Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions , 2004 .

[36]  Roberto Kawakami Harrop Galvão,et al.  The successive projections algorithm for interval selection in PLS , 2013 .

[37]  Haiyan Wang,et al.  Improving accuracy for cancer classification with a new algorithm for genes selection , 2012, BMC Bioinformatics.

[38]  Rasmus Bro,et al.  Exploring the phenotypic expression of a regulatory proteome-altering gene by spectroscopy and chemometrics , 2001 .

[39]  Yong-Huan Yun,et al.  A new method for wavelength interval selection that intelligently optimizes the locations, widths and combinations of the intervals. , 2015, The Analyst.

[40]  R. Yu,et al.  An ensemble of Monte Carlo uninformative variable elimination for wavelength selection. , 2008, Analytica chimica acta.

[41]  S. Tsakovski,et al.  Comparison of the variable importance in projection (VIP) and of the selectivity ratio (SR) methods for variable selection and interpretation , 2015 .

[42]  Rasmus Bro,et al.  Variable selection in regression—a tutorial , 2010 .

[43]  Hongdong Li,et al.  Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. , 2009, Analytica chimica acta.

[44]  C. Spiegelman,et al.  Theoretical Justification of Wavelength Selection in PLS Calibration:  Development of a New Algorithm. , 1998, Analytical Chemistry.

[45]  Yizeng Liang,et al.  A perspective demonstration on the importance of variable selection in inverse calibration for complex analytical systems. , 2013, The Analyst.

[46]  Dong-Sheng Cao,et al.  A new strategy of outlier detection for QSAR/QSPR , 2009, J. Comput. Chem..

[47]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[48]  Tarja Rajalahti,et al.  Discriminating variable test and selectivity ratio plot: quantitative tools for interpretation and variable (biomarker) selection in complex spectral or chromatographic profiles. , 2009, Analytical chemistry.

[49]  S. Wold,et al.  Wavelength interval selection in multicomponent spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. , 2002, Analytical chemistry.

[50]  Jiexin Zhang,et al.  Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups , 2012, BMC Bioinformatics.

[51]  L. Brás,et al.  A bootstrap‐based strategy for spectral interval selection in PLS regression , 2008 .

[52]  Kuangda Tian,et al.  A new spectral variable selection pattern using competitive adaptive reweighted sampling combined with successive projections algorithm. , 2014, The Analyst.

[53]  John H. Kalivas,et al.  Comparison of Forward Selection, Backward Elimination, and Generalized Simulated Annealing for Variable Selection , 1993 .

[54]  Jian-hui Jiang,et al.  Spectral regions selection to improve prediction ability of PLS models by changeable size moving window partial least squares and searching combination moving window partial least squares , 2004 .

[55]  Kaiyi Zheng,et al.  Stability competitive adaptive reweighted sampling (SCARS) and its applications to multivariate calibration of NIR spectra , 2012 .

[56]  Yvan Vander Heyden,et al.  Improved variable reduction in partial least squares modelling based on predictive-property-ranked variables and adaptation of partial least squares complexity. , 2011, Analytica chimica acta.

[57]  Dong-Sheng Cao,et al.  A simple idea on applying large regression coefficient to improve the genetic algorithm-PLS for variable selection in multivariate calibration , 2014 .