Using consensus interval partial least square in near infrared spectra analysis

Abstract This paper proposes a novel consensus modeling method for regression, which optimizes the weight coefficients of member models considering both error and error correlation of member models. Thus, the optimized objective function has clear physical significance. Furthermore, the root-mean-square error of cross-validation (RMSECV) and root-mean-square error of prediction (RMSEP) of the consensus model are better than any member model. Integrating this method with interval partial least squares algorithm (iPLS), the novel consensus interval partial least squares algorithm (CPLS) is achieved. The typical near infrared spectroscopy datasets are used to validate the effectiveness of CPLS. Compared to the commonly used partial least squares (PLS), iPLS and staked interval partial least squares algorithm (SPLS), CPLS produces better prediction performance.

[1]  Xiaojing Chen,et al.  A segmented PLS method based on genetic algorithm , 2014 .

[2]  Morgan B. McConico,et al.  Fourier Transform Infrared (FT-IR) Spectroscopy and Improved Principal Component Regression (PCR) for Quantification of Solid Analytes in Microalgae and Bacteria , 2011, Applied spectroscopy.

[3]  David A. Yuen,et al.  Ensemble of Linear Models for Predicting Drug Properties , 2005, J. Chem. Inf. Model..

[4]  Guoli Ji,et al.  TotalPLS: Local Dimension Reduction for Multicategory Microarray Data , 2014, IEEE Transactions on Human-Machine Systems.

[5]  Xiaojing Chen,et al.  Application of a hybrid variable selection method for determination of carbohydrate content in soy milk powder using visible and near infrared spectroscopy. , 2009, Journal of agricultural and food chemistry.

[6]  Zijiang Yang,et al.  Feature selection for high-dimensional multi-category data using PLS-based local recursive feature elimination , 2014, Expert Syst. Appl..

[7]  Tohru Okada,et al.  Carbon ion radiotherapy: clinical experiences at National Institute of Radiological Science (NIRS). , 2010, Journal of radiation research.

[8]  Chih-Fong Tsai,et al.  Training support vector machines based on stacked generalization for image classification , 2005, Neurocomputing.

[9]  Ronald D. Snee,et al.  Validation of Regression Models: Methods and Examples , 1977 .

[10]  Steven D. Brown,et al.  Stacked partial least squares regression analysis for spectral calibration and prediction , 2009 .

[11]  Steven D. Brown,et al.  Multivariate calibration of spectral data using dual-domain regression analysis , 2003 .

[12]  Beata Walczak,et al.  Spectral transformation and wavelength selection in near-infrared spectra classification , 1995 .

[13]  Ronei J. Poppi,et al.  Application of mid infrared spectroscopy and iPLS for the quantification of contaminants in lubricating oil , 2005 .

[14]  Zijiang Yang,et al.  Using partial least squares and support vector machines for bankruptcy prediction , 2011, Expert Syst. Appl..

[15]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[16]  Guoli Ji,et al.  PLS-based recursive feature elimination for high-dimensional small sample , 2014, Knowl. Based Syst..

[17]  Dong-Sheng Cao,et al.  A simple idea on applying large regression coefficient to improve the genetic algorithm-PLS for variable selection in multivariate calibration , 2014 .

[18]  R. Leardi,et al.  Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions , 2004 .

[19]  Xiaojing Chen,et al.  Identification of heavy metal-contaminated Tegillarca granosa using infrared spectroscopy , 2015 .

[20]  Gunther O. Hofmann,et al.  Traumatic and degenerative cartilage lesions: arthroscopic differentiation using near-infrared spectroscopy (NIRS) , 2013, Archives of Orthopaedic and Trauma Surgery.

[21]  Ellen J. Bass,et al.  Editorial IEEE Transactions on Human–Machine Systems: Year in Review for 2013 , 2014 .

[22]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[23]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[24]  W. Fred McClure,et al.  204 Years of near Infrared Technology: 1800–2003 , 2003 .

[25]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[26]  Ludovic Duponchel,et al.  Parallel genetic algorithm co-optimization of spectral pre-processing and wavelength selection for PLS regression , 2011 .

[27]  Noel D.G. White,et al.  Comparison of Partial Least Squares Regression (PLSR) and Principal Components Regression (PCR) Methods for Protein and Hardness Predictions using the Near-Infrared (NIR) Hyperspectral Images of Bulk Samples of Canadian Wheat , 2014, Food and Bioprocess Technology.

[28]  Huiling Chen,et al.  A consensus successive projections algorithm--multiple linear regression method for analyzing near infrared spectra. , 2015, Analytica chimica acta.

[29]  Di Wu,et al.  Detecting the quality of glycerol monolaurate: a method for using Fourier transform infrared spectroscopy with wavelet transform and modified uninformative variable elimination. , 2009, Analytica chimica acta.

[30]  R. Wightman,et al.  Multivariate concentration determination using principal component regression with residual analysis. , 2009, Trends in analytical chemistry : TRAC.

[31]  Xiaohua Zhou,et al.  Determination of Quercetin in Extracts of Ginkgo biloba L. Leaves by Near‐Infrared Reflectance Spectroscopy Based on Interval Partial Least‐Squares (iPLS) Model , 2007 .

[32]  Richard Dinsdale,et al.  Integration of NIRS and PCA techniques for the process monitoring of a sewage sludge anaerobic digester. , 2013, Bioresource technology.

[33]  Di Wu,et al.  Uninformative variable elimination for improvement of successive projections algorithm on spectral multivariable selection with different calibration algorithms for the rapid and non-destructive determination of protein content in dried laver , 2011 .

[34]  Y. Roggo,et al.  A review of near infrared spectroscopy and chemometrics in pharmaceutical technologies. , 2007, Journal of pharmaceutical and biomedical analysis.

[35]  Zijiang Yang,et al.  PLS-Based Gene Selection and Identification of Tumor-Specific Genes , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[36]  L. Breiman Stacked Regressions , 1996, Machine Learning.

[37]  Xueguang Shao,et al.  A consensus least squares support vector regression (LS-SVR) for analysis of near-infrared spectra of plant samples. , 2007, Talanta.

[38]  S. Engelsen,et al.  Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy , 2000 .