Random Subspace Regression Ensemble for Near-Infrared Spectroscopic Calibration of Tobacco Samples

An ensemble, a model-independent technique based on combining several models for classification/regression tasks, allows us to achieve a high accuracy that is often not achievable with single models. Such combinations have gained increasing attention in many fields. This paper proposes the use of random subspace (RS)-based regression ensemble as an alternative method for near-infrared (NIR) spectroscopic calibration of tobacco samples. Because of the considerable reduction of variables in a random subspace, multiple linear regression (MLR) is used as the base algorithm and the method is therefore also referred to as RS-MLR. The overall performance of the proposed RS-MLR method is compared to those of partial least square regression (PLSR), kernel principal component regression (KPCR) and kernel partial least square regression (KPLSR). The results reveal that the RS-MLR method not only has a simple concept but also can produce a more parsimonious and more accurate calibration model than PLSR, KPCR and KPLSR, at a lower computational cost. Besides, we also found that the RS-MLR method is very appropriate for the so-called small sample problems and that the calibration models built by RS-MLR are less sensitive to overfitting.

[1]  David H. Burns,et al.  Parsimonious calibration models for near-infrared spectroscopy using wavelets and scaling functions , 2006 .

[2]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[3]  Riccardo Leardi,et al.  Application of genetic algorithm–PLS for feature selection in spectral data sets , 2000 .

[4]  K. Funatsu,et al.  Discrimination of Poly(vinyl chloride) Samples with Different Plasticizers and Prediction of Plasticizer Contents in Poly(vinyl chloride) Using Near-infrared Spectroscopy and Neural-network Analysis , 2003, Analytical sciences : the international journal of the Japan Society for Analytical Chemistry.

[5]  Jian-hui Jiang,et al.  Spectral regions selection to improve prediction ability of PLS models by changeable size moving window partial least squares and searching combination moving window partial least squares , 2004 .

[6]  D L Massart,et al.  Boosting partial least squares. , 2005, Analytical chemistry.

[7]  J. Rantanen,et al.  Use of in-line near-infrared spectroscopy in combination with chemometrics for improved understanding of pharmaceutical processes. , 2005, Analytical chemistry.

[8]  Xueguang Shao,et al.  Determination of Chlorogenic Acid in Plant Samples by Using Near-Infrared Spectrum with Wavelet Transform Preprocessing , 2004, Analytical sciences : the international journal of the Japan Society for Analytical Chemistry.

[9]  L. Buydens,et al.  Multivariate calibration with least-squares support vector machines. , 2004, Analytical chemistry.

[10]  Thomas Lengauer,et al.  Ensemble Methods for Classification in Cheminformatics , 2004, J. Chem. Inf. Model..

[11]  M. Forina,et al.  Study of the aging and oxidation processes of vinegar samples from different origins during storage by near-infrared spectroscopy , 2006 .

[12]  R. Leardi,et al.  Variable selection for multivariate calibration using a genetic algorithm: prediction of additive concentrations in polymer films from Fourier transform-infrared spectral data , 2002 .

[13]  Wei-Rong Li,et al.  Direct spectrometric determination of proteins in body fluids using a near-infrared cyanine dye , 2003, Analytical and bioanalytical chemistry.

[14]  Jian-hui Jiang,et al.  Chemometric classification of traditional Chinese medicines by their geographical origins using near-infrared reflectance spectra. , 2006, Analytical sciences : the international journal of the Japan Society for Analytical Chemistry.

[15]  J. L. Rodriguez-Otero,et al.  Analysis of Dairy Products by Near-Infrared Spectroscopy: A Review , 1997 .

[16]  Xingyi Huang,et al.  Simultaneous determination of total polyphenols and caffeine contents of green tea by near-infrared reflectance spectroscopy , 2006 .

[17]  In-Beum Lee,et al.  A novel multivariate regression approach based on kernel partial least squares with orthogonal signal correction , 2005 .

[18]  R. Poppi,et al.  Least-squares support vector machines and near infrared spectroscopy for quantification of common adulterants in powdered milk. , 2006, Analytica chimica acta.

[19]  J. M. Garrigues,et al.  Fourier-transform infrared determination of nicotine in tobacco samples by transmittance measurements after leaching with CHCl3 , 1998 .

[20]  T. Næs,et al.  Ensemble methods and data augmentation by noise addition applied to the analysis of spectroscopic data , 2005 .

[21]  Meng-long Li,et al.  Calibration transfer between two near-infrared spectrometers based on a wavelet packet transform. , 2007, Analytical sciences : the international journal of the Japan Society for Analytical Chemistry.

[22]  J. Sádecká,et al.  Determination of organic acids in tobacco by capillary isotachophoresis. , 2003, Journal of chromatography. A.

[23]  M A Arnold,et al.  Genetic algorithm-based wavelength selection for the near-infrared determination of glucose in biological matrixes: initialization strategies and effects of spectral resolution. , 1998, Analytical chemistry.

[24]  S. Wold,et al.  Wavelength interval selection in multicomponent spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. , 2002, Analytical chemistry.