Successive projections algorithm combined with uninformative variable elimination for spectral variable selection

Abstracts The UVE–SPA method, successive projections algorithm (SPA) combined with uninformative variable elimination (UVE) is proposed as a novel variable selection approach for multivariate calibration. UVE is used to select informative variables, and SPA is followed to select variables with minimum redundant information from the informative variables. The proposed method was applied to near-infrared (NIR) reflectance data for analysis of nicotine in tobacco lamina and NIR transmission data for active pharmaceutical ingredient (API) in single tablet. On the aspect of elimination of uninformative variables, the effect of UVE using first derivative spectra was better than that of using raw spectra. In terms of variable selection, fewer variables with better performance were selected by UVE—SPA method than by direct SPA method. For NIR spectral analysis of nicotine in tobacco lamina, the number of variables selected from 3001 spectral variables reduced from 48 by direct SPA to 35 by UVE–SPA, and the root mean squared error of prediction set (RMSEP) of the corresponding MLR models decreased from 0.174 (%, mg/mg) to 0.160. For NIR spectral analysis of API in each tablet, the number of variables selected from 650 spectral variables reduced from 46 by direct SPA to 17 by UVE–SPA, and RMSEP of the corresponding multiple linear regression (MLR) models decreased from 0.842 (%, mg/mg) to 0.473. MLR model using variables selected by UVE–SPA had better prediction performance than full-spectrum partial least-squares (PLS) model, and comparable to PLS model of UVE.

[1]  Maria Fernanda Pimentel,et al.  Aspects of the successive projections algorithm for variable selection in multivariate calibration applied to plasma emission spectrometry , 2001 .

[2]  Roberto Kawakami Harrop Galvão,et al.  Optimal wavelet filter construction using X and Y data , 2004 .

[3]  Maria Fernanda Pimentel,et al.  A Linear Semi-infinite Programming Strategy for Constructing Optimal Wavelet Transforms in Multivariate Calibration Problems , 2003, J. Chem. Inf. Comput. Sci..

[4]  X. Shao,et al.  A background and noise elimination method for quantitative calibration of near infrared spectra , 2004 .

[5]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[6]  Roberto Kawakami Harrop Galvão,et al.  Simultaneous spectrometric determination of Cu2+, Mn2+ and Zn2+ in polivitaminic/polimineral drug using SPA and GA algorithms for variable selection , 2005 .

[7]  Jonas Johansson,et al.  Comparison of different variable selection methods conducted on NIR transmission measurements on intact tablets , 2003 .

[8]  Tormod Næs,et al.  Understanding the collinearity problem in regression and discriminant analysis , 2001 .

[9]  Celio Pasquini,et al.  A strategy for selecting calibration samples for multivariate modelling , 2004 .

[10]  Tetsuo Iwata,et al.  Application of the Modified UVE-PLS Method for a Mid-Infrared Absorption Spectral Data Set of Water—Ethanol Mixtures , 2000 .

[11]  Fang Wang,et al.  A method for near-infrared spectral calibration of complex plant samples with wavelet transform and elimination of uninformative variables , 2004, Analytical and bioanalytical chemistry.

[12]  Y. Heyden,et al.  Prediction of total green tea antioxidant capacity from chromatograms by multivariate modeling. , 2005, Journal of chromatography. A.

[13]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[14]  Desire L. Massart,et al.  Using contrasts as data pretreatment method in pattern recognition of multivariate data , 1999 .

[15]  Maria Fernanda Pimentel,et al.  Robust modeling for multivariate calibration transfer by the successive projections algorithm , 2005 .

[16]  Celio Pasquini,et al.  Determination of total sulfur in diesel fuel employing NIR spectroscopy and multivariate calibration. , 2003, The Analyst.

[17]  Maria Fernanda Pimentel,et al.  A solution to the wavelet transform optimization problem in multicomponent analysis , 2003 .

[18]  María S. Di Nezio,et al.  Successive projections algorithm improving the multivariate simultaneous direct spectrophotometric determination of five phenolic compounds in sea water , 2007 .