A Bootstrapping Soft Shrinkage Approach and Interval Random Variables Selection Hybrid Model for Variable Selection in Near-Infrared Spectroscopy

High dimensionality problem in spectra datasets is a significant challenge to researchers and requires the design of effective methods that can extract the optimal variable subset that can improve the accuracy of predictions or classifications. In this study, a hybrid variable selection method, based on the incremental number of variables using bootstrapping soft shrinkage method (BOSS) and interval random variable selection (IRVS) method is proposed and named BOSS-IRVS. The BOSS method is used to determine the informative intervals, while the IRVS method is used to search for informative variables in the informative interval determined by BOSS method. The proposed BOSS-IRVS method was tested using seven different public accessible near-infrared (NIR) spectroscopic datasets of corn, diesel fuel, soy, wheat protein, and hemoglobin types. The performance of the proposed method was compared with that of two outstanding variable selection methods i.e. BOSS and hybrid variable selection strategy based on continuous shrinkage of variable space (VCPA-IRIV). The experimental results showed clearly that the proposed method BOSS-IRVS outperforms VCPA-IRIV and BOSS methods in all tested datasets and improved the percentage of the prediction accuracy, by 15.4 and 15.3 for corn moisture,13.4 and 49.8 for corn oil, 41.5 and 50.6 for corn protein, 12.6 and 5.6 for soy moisture, 0.6 and 6.3 for total diesel fuel, 19.9 and 14.3 for wheat protein, and 5.8 and 20.3 for hemoglobin.

[1]  Hasan Ali Gamal Al-kaf,et al.  Artificial Neural Network and Savitzky Golay Derivative in Predicting Blood Hemoglobin Using Near-Infrared Spectrum , 2018, International Journal of Integrated Engineering.

[2]  Frans van den Berg,et al.  Review of the most common pre-processing techniques for near-infrared spectra , 2009 .

[3]  Yong-Huan Yun,et al.  A new method for wavelength interval selection that intelligently optimizes the locations, widths and combinations of the intervals. , 2015, The Analyst.

[4]  M. Forina,et al.  Transfer of calibration function in near-infrared spectroscopy , 1995 .

[5]  Kaiyi Zheng,et al.  Stability competitive adaptive reweighted sampling (SCARS) and its applications to multivariate calibration of NIR spectra , 2012 .

[6]  Dong Wang,et al.  Successive projections algorithm combined with uninformative variable elimination for spectral variable selection , 2008 .

[7]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[8]  J. Kalivas Two data sets of near infrared spectra , 1997 .

[9]  Dong-Sheng Cao,et al.  An overview of variable selection methods in multivariate analysis of near-infrared spectra , 2019, TrAC Trends in Analytical Chemistry.

[10]  Qing-Song Xu,et al.  Fisher optimal subspace shrinkage for block variable selection with applications to NIR spectroscopic analysis , 2016 .

[11]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[12]  Hongdong Li,et al.  Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. , 2009, Analytica chimica acta.

[13]  Lunzhao Yi,et al.  A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling. , 2014, The Analyst.

[14]  Jordi Coello,et al.  NIR calibration in non-linear systems: different PLS approaches and artificial neural networks , 2000 .

[15]  Dong-Sheng Cao,et al.  A bootstrapping soft shrinkage approach for variable selection in chemical modeling. , 2016, Analytica chimica acta.

[16]  Kuangda Tian,et al.  A new spectral variable selection pattern using competitive adaptive reweighted sampling combined with successive projections algorithm. , 2014, The Analyst.

[17]  Yi-Zeng Liang,et al.  Iteratively variable subset optimization for multivariate calibration , 2015 .

[18]  Dong-Sheng Cao,et al.  An efficient method of wavelength interval selection based on random frog for multivariate spectral calibration. , 2013, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[19]  Shungeng Min,et al.  A novel algorithm for spectral interval combination optimization. , 2016, Analytica chimica acta.

[20]  Qing-Song Xu,et al.  Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification. , 2012, Analytica chimica acta.

[21]  Kuangda Tian,et al.  A modification of the bootstrapping soft shrinkage approach for spectral variable selection in the issue of over-fitting, model accuracy and variable selection credibility. , 2019, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[22]  Qingsong Xu,et al.  A selective review and comparison for interval variable selection in spectroscopic modeling , 2017 .

[23]  Benoît Igne,et al.  The 2010 IDRC Software Shoot-out at a Glance , 2010 .

[24]  Dong-Sheng Cao,et al.  A simple idea on applying large regression coefficient to improve the genetic algorithm-PLS for variable selection in multivariate calibration , 2014 .

[25]  Yi-Zeng Liang,et al.  Model population analysis in chemometrics , 2015 .

[26]  Riccardo Leardi,et al.  Application of genetic algorithm–PLS for feature selection in spectral data sets , 2000 .

[27]  Ting Wu,et al.  A new strategy of least absolute shrinkage and selection operator coupled with sampling error profile analysis for wavelength selection , 2018 .

[28]  Zhenhong Jia,et al.  An Variable Selection Method of the Significance Multivariate Correlation Competitive Population Analysis for Near-Infrared Spectroscopy in Chemical Modeling , 2019, IEEE Access.

[29]  Dong-Sheng Cao,et al.  A hybrid variable selection strategy based on continuous shrinkage of variable space in multivariate calibration. , 2019, Analytica chimica acta.

[30]  Abdulqader M. Mohsen,et al.  Improved model population analysis in near infrared spectroscopy , 2019, 2019 First International Conference of Intelligent Computing and Engineering (ICOICE).

[31]  L. Brás,et al.  A bootstrap‐based strategy for spectral interval selection in PLS regression , 2008 .

[32]  Dong-Sheng Cao,et al.  Model population analysis for variable selection , 2010 .

[33]  Dong-Sheng Cao,et al.  A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration. , 2014, Analytica chimica acta.

[34]  Huazhou Chen,et al.  A combination strategy of random forest and back propagation network for variable selection in spectral calibration , 2018, Chemometrics and Intelligent Laboratory Systems.

[35]  Ling Ma,et al.  A fast variable selection method for quantitative analysis of soils using laser-induced breakdown spectroscopy , 2017 .

[36]  S. Wold,et al.  Wavelength interval selection in multicomponent spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. , 2002, Analytical chemistry.

[37]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .