A consensus PLS method based on diverse wavelength variables models for analysis of near-infrared spectra

Abstract A new method named as diverse variables-consensus partial least squares (DV-CPLS) is proposed based on consensus (ensemble) strategy combined with uninformative variable elimination (UVE) technique. In the approach, UVE-PLS is used to construct member models with different numbers of variables (wavelengths) instead of altering training subset in conventional consensus method, and then prediction results of multiple member models are combined by a new weighted averaging way to give ensemble results. DV-CPLS is applied for building quantitative model between diesel near-infrared (NIR) spectra and cetane number (CN), and the results show fine prediction capability in terms of accuracy and robustness. When DV-CPLS was further combined with wavelet transform (WT) method, a more parsimonious model was obtained. The proposed method improves the performance of conventional PLS linear modeling in determination of diesel CN by NIR spectra. So it is hoped that it will help further investigations of consensus modeling and variable selection technique, and as well as applications in the sphere of NIR and even other spectral analysis of sophisticated systems.

[1]  Paola Gramatica,et al.  Validated QSAR Prediction of OH Tropospheric Degradation of VOCs: Splitting into Training-Test Sets and Consensus Modeling , 2004, J. Chem. Inf. Model..

[2]  Paul Geladi,et al.  The start and early history of chemometrics: Selected interviews. Part 2 , 1990 .

[3]  S. Fischer,et al.  Multivariate calibration by near infrared spectroscopy for the determination of the vitamin E and the antioxidant properties of quinoa. , 2013, Talanta.

[4]  Roman M. Balabin,et al.  Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. , 2011, Analytica chimica acta.

[5]  Xueguang Shao,et al.  A consensus least squares support vector regression (LS-SVR) for analysis of near-infrared spectra of plant samples. , 2007, Talanta.

[6]  Alex B. McBratney,et al.  Multivariate calibration of hyperspectral γ‐ray energy spectra for proximal soil sensing , 2007 .

[7]  Paul J. Williams,et al.  Investigation of fungal development in maize kernels using NIR hyperspectral imaging and multivariate data analysis , 2012 .

[8]  Alexander Kai-man Leung,et al.  Wavelet: a new trend in chemistry. , 2003, Accounts of chemical research.

[9]  J. M. Soriano-Disla,et al.  The Performance of Visible, Near-, and Mid-Infrared Reflectance Spectroscopy for Prediction of Soil Physical, Chemical, and Biological Properties , 2014 .

[10]  C. Pasquini,et al.  A low cost short wave near infrared spectrophotometer: application for determination of quality parameters of diesel fuel. , 2010, Analytica chimica acta.

[11]  Meng-long Li,et al.  An ensemble method based on uninformative variable elimination and mutual information for spectral multivariate calibration. , 2010, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[12]  Michel Verleysen,et al.  Fast Selection of Spectral Variables with B-Spline Compression , 2007, ArXiv.

[13]  Xueguang Shao,et al.  Removing uncertain variables based on ensemble partial least squares. , 2007, Analytica chimica acta.

[14]  Tong Wu,et al.  Improvement of spectral calibration for food analysis through multi-model fusion. , 2012, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[15]  W. Cai,et al.  An improved boosting partial least squares method for near-infrared spectroscopic quantitative analysis. , 2010, Analytica chimica acta.

[16]  R. Yu,et al.  Variable-weighted least-squares support vector machine for multivariate spectral analysis. , 2010, Talanta.

[17]  P. A. Gorry General least-squares smoothing and differentiation by the convolution (Savitzky-Golay) method , 1990 .

[18]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[19]  W. Cai,et al.  Multiblock partial least squares regression based on wavelet transform for quantitative analysis of near infrared spectra , 2010 .

[20]  Li Yan-kun,et al.  Determination of diesel cetane number by consensus modeling based on uninformative variable elimination , 2012 .

[21]  D L Massart,et al.  Boosting partial least squares. , 2005, Analytical chemistry.

[22]  Weida Tong,et al.  A Partial Least Squares‐Based Consensus Regression Method for the Analysis of Near‐Infrared Complex Spectral Data of Plant Samples , 2006 .

[23]  Weida Tong,et al.  Decision Forest: Combining the Predictions of Multiple Independent Decision Tree Models , 2003, J. Chem. Inf. Comput. Sci..