Variable space boosting partial least squares for multivariate calibration of near-infrared spectroscopy ☆

Abstract A novel boosting strategy by establishing sub-model from variable direction named variable space boosting partial least squares (VS-BPLS) was proposed for multivariate calibration of near-infrared (NIR) spectroscopy. At the first cycle, all the variables in the training set are given the same sampling weights and then a certain number of variables are selected to build PLS sub-model according to the distribution of the sampling weights. In the following cycles, the sampling weights of the variables in the training set are given by a predefined loss function. This loss function is about the error of known and predicted spectra that is obtained by the product of score and loading of PLS sub-models. The final prediction for unknown sample is obtained by the weighted average of each prediction of all the sub-models. The proposed method not only can solve the small sample problem, but also remove redundant information in variables. The performance of VS-BPLS is tested with two NIR spectral datasets. As comparisons to VS-BPLS, the conventional PLS and two variable selection method Monte Carlo uninformative variable elimination PLS (MCUVE-PLS) and randomization test PLS (RT-PLS) have also been investigated. Results show that VS-BPLS has a superiority for small sample problems in prediction accuracy and stability compared with the PLS, MCUVE-PLS and RT-PLS.

[1]  Guangzao Huang,et al.  Using consensus interval partial least square in near infrared spectra analysis , 2015 .

[2]  W. Cai,et al.  Quantitative Determination of the Components in Corn and Tobacco Samples by Using Near-Infrared Spectroscopy and Multiblock Partial Least Squares , 2010 .

[3]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[4]  Zhiqiang Ge,et al.  Subspace partial least squares model for multivariate spectroscopic calibration , 2013 .

[5]  S. Engelsen,et al.  Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy , 2000 .

[6]  Yukihiro Ozaki,et al.  Investigations of bagged kernel partial least squares (KPLS) and boosting KPLS with applications to near‐infrared (NIR) spectra , 2006 .

[7]  Menglong Li,et al.  Subspace Regression Ensemble Method Based on Variable Clustering for Near-Infrared Spectroscopic Calibration , 2009 .

[8]  G. Downey,et al.  Characterization of Near-Infrared Spectral Variance in the Authentication of Skim and Nonfat Dry Milk Powder Collection Using ANOVA-PCA, Pooled-ANOVA, and Partial Least-Squares Regression , 2014, Journal of agricultural and food chemistry.

[9]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[10]  G. Si,et al.  An improved ensemble model for the quantitative analysis of infrared spectra , 2015 .

[11]  A. Gowen,et al.  Evaluation of near-infrared chemical imaging for the prediction of surface water quality parameters , 2015 .

[12]  Qian-xuan Zhang,et al.  A strategy of small sample modeling for multivariate regression based on improved Boosting PLS , 2012 .

[13]  Menglong Li,et al.  Determination of nicotine in tobacco samples by near-infrared spectroscopy and boosting partial least squares , 2010 .

[14]  Shi-Miao Tan,et al.  Boosting partial least‐squares discriminant analysis with application to near infrared spectroscopic tea variety discrimination , 2012 .

[15]  Yi-Zeng Liang,et al.  Monte Carlo cross validation , 2001 .

[16]  Ting Wu,et al.  Improvement of NIR model by fractional order Savitzky–Golay derivation (FOSGD) coupled with wavelength selection , 2015 .

[17]  Hongdong Li,et al.  Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. , 2009, Analytica chimica acta.

[18]  Jian-hui Jiang,et al.  MCCV stacked regression for model combination and fast spectral interval selection in multivariate calibration , 2007 .

[19]  Dong-Sheng Cao,et al.  The boosting: A new idea of building models , 2010 .

[20]  Dong-Sheng Cao,et al.  A bootstrapping soft shrinkage approach for variable selection in chemical modeling. , 2016, Analytica chimica acta.

[21]  Xueguang Shao,et al.  Multivariate calibration methods in near infrared spectroscopic analysis , 2010 .

[22]  Jian-Hui Jiang,et al.  Adaptive Configuring of Radial Basis Function Network by Hybrid Particle Swarm Algorithm for QSAR Studies of Organic Compounds , 2006, J. Chem. Inf. Model..

[23]  Li Yan-kun,et al.  Determination of diesel cetane number by consensus modeling based on uninformative variable elimination , 2012 .

[24]  Xueguang Shao,et al.  A wavelength selection method based on randomization test for near-infrared spectral analysis , 2009 .

[25]  Lutgarde M. C. Buydens,et al.  Breaking with trends in pre-processing? , 2013 .

[26]  L. Buydens,et al.  Multivariate calibration with least-squares support vector machines. , 2004, Analytical chemistry.

[27]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .

[28]  W. Cai,et al.  An improved boosting partial least squares method for near-infrared spectroscopic quantitative analysis. , 2010, Analytica chimica acta.

[29]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[30]  Lijuan Xie,et al.  Technology using near infrared spectroscopic and multivariate analysis to determine the soluble solids content of citrus fruit , 2014 .

[31]  Kimito Funatsu,et al.  Genetic algorithm‐based wavelength selection method for spectral calibration , 2011 .

[32]  W. Cai,et al.  A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra , 2008 .

[33]  Beata Walczak,et al.  Again about partial least squares and feature selection , 2012 .

[34]  Lutgarde M. C. Buydens,et al.  Interpretation of variable importance in Partial Least Squares with Significance Multivariate Correlation (sMC) , 2014 .

[35]  R. Yu,et al.  An ensemble of Monte Carlo uninformative variable elimination for wavelength selection. , 2008, Analytica chimica acta.

[36]  Qun Ma,et al.  Optimization of Parameter Selection for Partial Least Squares Model Development , 2015, Scientific Reports.

[37]  Jiemei Chen,et al.  Determination of glycated hemoglobin using near-infrared spectroscopy combined with equidistant combination partial least squares , 2015 .

[38]  Shawn X. Yin,et al.  Low level drug product API form analysis - Avalide tablet NIR quantitative method development and robustness challenges. , 2014, Journal of pharmaceutical and biomedical analysis.

[39]  Yankun Li,et al.  A consensus PLS method based on diverse wavelength variables models for analysis of near-infrared spectra , 2014 .

[40]  D L Massart,et al.  Boosting partial least squares. , 2005, Analytical chemistry.

[41]  Yi-Zeng Liang,et al.  Iteratively variable subset optimization for multivariate calibration , 2015 .

[42]  Jun-Hu Cheng,et al.  Applications of Near-infrared Spectroscopy in Food Safety Evaluation and Control: A Review of Recent Research Advances , 2015, Critical reviews in food science and nutrition.