A gradient descent boosting spectrum modeling method based on back interval partial least squares

When the technique of boosting regression is applied to near-infrared spectroscopy, the full spectrum of samples are generally used to perform partial least squares (PLS) modeling. However, there is a large amount of redundant information and noise contained in the full spectrum. This not only increases the complexity of the model, but also reduces its predictive performance. In addition, the boosting method is sensitive to data noise. When the data are mixed with too much noise, the generalization performance of boosting will decrease, and the prediction error and the variance of PLS will be relatively large. To solve these problems, a gradient descent boosting ensemble method combined with backward interval PLS (GD-Boosting-BiPLS) is proposed in this paper. BiPLS is used to select the effective variables for the boosting base model, and each base model is trained sequentially by resampling. The spectral segmentation parameter of BiPLS and the iteration parameter of boosting are fused, and the weight of each base model is distributed by the gradient descent strategy. This leads to a new ensemble model (forward additive model) in the direction of reduced residuals. The final model is the ensemble model that obtains the minimum root mean square error of prediction (RMSEP). The proposed method is applied to the quantitative prediction of ethanol concentrations. Over iterations 1-50, the average correlation coefficients of the calibration and validation sets are 0.9628 and 0.9388, and the average RMSE of cross-validation and RMSEP are 0.0732 and 0.0675, respectively. The overall performance of the proposed GD-Boosting-BiPLS method is compared with those of various ensemble strategies and 4 kinds of state-of-the-art spectral modeling methods. The experimental results reveal that the proposed method has the best generalization performance and stability. HighlightsAn ensemble model of gradient descent boosting and BiPLS is proposed.With BiPLS as the base model method can reduce the sensitivity of boosting to noise.The gradient descent boosting strategy can improve the performance of base models.The iteration parameter and the segmentation parameter are fused to simplify the model.The final ensemble model can remain stable at different initial number of iterations.

[1]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[2]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[3]  Xiao-ri Zhan,et al.  [Determination of hesperidin in tangerine leaf by near-infrared spectroscopy with SPXY algorithm for sample subset partitioning and Monte Carlo cross validation]. , 2009, Guang pu xue yu guang pu fen xi = Guang pu.

[4]  Haiyan Chen,et al.  Bagging-like metric learning for support vector regression , 2014, Knowl. Based Syst..

[5]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[6]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[7]  Z. Feng-qi Selecting the Main Factors Influencing the Densities of Polynitroaromatic Compounds via Adaptive Gradient Boosting Algorithm , 2011 .

[8]  Hongdong Li,et al.  Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. , 2009, Analytica chimica acta.

[9]  Jiewen Zhao,et al.  Selection of the efficient wavelength regions in FT-NIR spectroscopy for determination of SSC of ‘Fuji’ apple based on BiPLS and FiPLS models , 2007 .

[10]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[11]  Zhen Zhao,et al.  Multiple Regression Machine System Based on Ensemble Extreme Learning Machine for Soft Sensor , 2013 .

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[13]  Feng Gao,et al.  Boosting regression methods based on a geometric conversion approach: Using SVMs base learners , 2013, Neurocomputing.

[14]  S. Engelsen,et al.  Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy , 2000 .

[15]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[16]  J. Friedman Stochastic gradient boosting , 2002 .

[17]  Zhengjun Zha,et al.  Gradient-domain-based enhancement of multi-view depth video , 2014, Inf. Sci..

[18]  Xu Wang,et al.  A bundled-optimization model of multiview dense depth map synthesis for dynamic scene reconstruction , 2015, Inf. Sci..

[19]  Yue Gao,et al.  Cross-View Down/Up-Sampling Method for Multiview Depth Video Coding , 2012, IEEE Signal Processing Letters.

[20]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[21]  R. Yu,et al.  An ensemble of Monte Carlo uninformative variable elimination for wavelength selection. , 2008, Analytica chimica acta.

[22]  Quan Sun,et al.  Bagging Ensemble Selection for Regression , 2012, Australasian Conference on Artificial Intelligence.

[23]  Marc Chaumont,et al.  Steganalysis by ensemble classifiers with boosting by regression, and post-selection of features , 2012, 2012 19th IEEE International Conference on Image Processing.

[24]  C. Xiao Research and Application Progress of Chemometrics Methods in Near Infrared Spectroscopic Analysis , 2008 .