A modification of the bootstrapping soft shrinkage approach for spectral variable selection in the issue of over-fitting, model accuracy and variable selection credibility.

In this study, we proposed a new computational method stabilized bootstrapping soft shrinkage approach (SBOSS) for variable selection based on bootstrapping soft shrinkage approach (BOSS) which can enhance the analysis of chemical interest from the massive variables among the overlapped absorption bands. In SBOSS, variable is selected by the index of stability of regression coefficients instead of regression coefficients absolute value. In each loop, a weighted bootstrap sampling (WBS) is applied to generate sub-models, according to the weights update by conducting model population analysis (MPA) on the stability of regression coefficients (RC) of these sub-models. Finally, the subset with the lowest RMSECV is chosen to be the optimal variable set. The performance of the SBOSS was evaluated by one simulated dataset and three NIR datasets. The results show that SBOSS can select the fewer variables and supply the least RMSEP and latent variable number of the PLS model with the best stability comparing with methods of Monte Carlo uninformative variables elimination (MCUVE), genetic algorithm (GA), competitive reweighted sampling (CARS), stability of competitive adaptive reweighted sampling (SCARS) and BOSS.

[1]  R. Leardi,et al.  Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions , 2004 .

[2]  C. Spiegelman,et al.  Theoretical Justification of Wavelength Selection in PLS Calibration:  Development of a New Algorithm. , 1998, Analytical Chemistry.

[3]  Zou Xiaobo,et al.  Variables selection methods in near-infrared spectroscopy. , 2010, Analytica chimica acta.

[4]  Lunzhao Yi,et al.  A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling. , 2014, The Analyst.

[5]  Hongdong Li,et al.  Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. , 2009, Analytica chimica acta.

[6]  R. Yu,et al.  An ensemble of Monte Carlo uninformative variable elimination for wavelength selection. , 2008, Analytica chimica acta.

[7]  Dong Wang,et al.  Successive projections algorithm combined with uninformative variable elimination for spectral variable selection , 2008 .

[8]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[9]  Qing-Song Xu,et al.  Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification. , 2012, Analytica chimica acta.

[10]  Tahir Mehmood,et al.  A review of variable selection methods in Partial Least Squares Regression , 2012 .

[11]  Franco Allegrini,et al.  A new and efficient variable selection algorithm based on ant colony optimization. Applications to near infrared spectroscopy/partial least-squares analysis. , 2011, Analytica chimica acta.

[12]  Elaine Martin,et al.  Bayesian linear regression and variable selection for spectroscopic calibration. , 2009, Analytica chimica acta.

[13]  Jiewen Zhao,et al.  Selection of the efficient wavelength regions in FT-NIR spectroscopy for determination of SSC of ‘Fuji’ apple based on BiPLS and FiPLS models , 2007 .

[14]  S. Tsakovski,et al.  Comparison of the variable importance in projection (VIP) and of the selectivity ratio (SR) methods for variable selection and interpretation , 2015 .

[15]  Shih-Wei Lin,et al.  Particle swarm optimization for parameter determination and feature selection of support vector machines , 2008, Expert Syst. Appl..

[16]  S. Lanteri,et al.  Selection of useful predictors in multivariate calibration , 2004, Analytical and bioanalytical chemistry.

[17]  S. Wold,et al.  Wavelength interval selection in multicomponent spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. , 2002, Analytical chemistry.

[18]  S. Engelsen,et al.  Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy , 2000 .

[19]  Roberto Kawakami Harrop Galvão,et al.  A modification of the successive projections algorithm for spectral variable selection in the presence of unknown interferents. , 2011, Analytica chimica acta.

[20]  Qing-Song Xu,et al.  Using variable combination population analysis for variable selection in multivariate calibration. , 2015, Analytica chimica acta.

[21]  Kaiyi Zheng,et al.  Stability competitive adaptive reweighted sampling (SCARS) and its applications to multivariate calibration of NIR spectra , 2012 .

[22]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[23]  Dong-Sheng Cao,et al.  A new strategy to prevent over-fitting in partial least squares models based on model population analysis. , 2015, Analytica chimica acta.

[24]  Riccardo Leardi,et al.  Genetic Algorithms as a Tool for Wavelength Selection in Multivariate Calibration , 1995 .

[25]  Bahram Hemmateenejad,et al.  Combination of Ant Colony Optimization with Various Local Search Strategies. A Novel Method for Variable Selection in Multivariate Calibration and QSPR Study , 2009 .

[26]  Dong-Sheng Cao,et al.  Model-population analysis and its applications in chemical and biological modeling , 2012 .

[27]  Dong-Sheng Cao,et al.  Model population analysis for variable selection , 2010 .

[28]  Jian-hui Jiang,et al.  Spectral regions selection to improve prediction ability of PLS models by changeable size moving window partial least squares and searching combination moving window partial least squares , 2004 .

[29]  Dong-Sheng Cao,et al.  A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration. , 2014, Analytica chimica acta.

[30]  Stefan Engelen,et al.  MicroScope—an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data , 2012, Nucleic Acids Res..

[31]  Xueguang Shao,et al.  A wavelength selection method based on randomization test for near-infrared spectral analysis , 2009 .

[32]  Alejandro C. Olivieri,et al.  A new family of genetic algorithms for wavelength interval selection in multivariate analytical spectroscopy , 2003 .

[33]  John H. Kalivas,et al.  Simulated‐annealing‐based optimization algorithms: Fundamentals and wavelength selection applications , 1995 .

[34]  Dong-Sheng Cao,et al.  A bootstrapping soft shrinkage approach for variable selection in chemical modeling. , 2016, Analytica chimica acta.

[35]  Qingsong Xu,et al.  Elastic Net Grouping Variable Selection Combined with Partial Least Squares Regression (EN-PLSR) for the Analysis of Strongly Multi-Collinear Spectroscopic Data , 2011, Applied spectroscopy.

[36]  Yvan Vander Heyden,et al.  Predictive-property-ranked variable reduction with final complexity adapted models in partial least squares modeling for multiple responses. , 2013, Analytical chemistry.

[37]  Dong-Sheng Cao,et al.  An efficient method of wavelength interval selection based on random frog for multivariate spectral calibration. , 2013, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.