A modification of the successive projections algorithm for spectral variable selection in the presence of unknown interferents.

This work proposes a modification to the successive projections algorithm (SPA) aimed at selecting spectral variables for multiple linear regression (MLR) in the presence of unknown interferents not included in the calibration data set. The modified algorithm favours the selection of variables in which the effect of the interferent is less pronounced. The proposed procedure can be regarded as an adaptive modelling technique, because the spectral features of the samples to be analyzed are considered in the variable selection process. The advantages of this new approach are demonstrated in two analytical problems, namely (1) ultraviolet-visible spectrometric determination of tartrazine, allure red and sunset yellow in aqueous solutions under the interference of erythrosine, and (2) near-infrared spectrometric determination of ethanol in gasoline under the interference of toluene. In these case studies, the performance of conventional MLR-SPA models is substantially degraded by the presence of the interferent. This problem is circumvented by applying the proposed Adaptive MLR-SPA approach, which results in prediction errors smaller than those obtained by three other multivariate calibration techniques, namely stepwise regression, full-spectrum partial-least-squares (PLS) and PLS with variables selected by a genetic algorithm. An inspection of the variable selection results reveals that the Adaptive approach successfully avoids spectral regions in which the interference is more intense.

[1]  Patricia C Damiani,et al.  Four-way kinetic-excitation-emission fluorescence data processed by multi-way algorithms. Determination of carbaryl and 1-naphthol in water samples in the presence of fluorescent interferents. , 2010, Analytica chimica acta.

[2]  C. L. Mallows Some comments on C_p , 1973 .

[3]  T. Fearn,et al.  Bayesian Wavelet Regression on Curves With Application to a Spectroscopic Calibration Problem , 2001 .

[4]  Roberto Kawakami Harrop Galvão,et al.  The successive projections algorithm for spectral variable selection in classification problems , 2005 .

[5]  Zou Xiaobo,et al.  Variables selection methods in near-infrared spectroscopy. , 2010, Analytica chimica acta.

[6]  Maria Fernanda Pimentel,et al.  Robust modeling for multivariate calibration transfer by the successive projections algorithm , 2005 .

[7]  B. Kowalski,et al.  Error propagation and optimal performance in multicomponent analysis , 1981 .

[8]  D. Massart,et al.  The Mahalanobis distance , 2000 .

[9]  M. E. Johnson,et al.  Generalized simulated annealing for function optimization , 1986 .

[10]  N. Draper,et al.  Applied Regression Analysis , 1967 .

[11]  Satoshi Kawata,et al.  Optimal Wavelength Selection for Quantitative Analysis , 1986 .

[12]  Maria Fernanda Pimentel,et al.  Aspects of the successive projections algorithm for variable selection in multivariate calibration applied to plasma emission spectrometry , 2001 .

[13]  Wolfhard Wegscheider,et al.  Spectrophotometric multicomponent analysis applied to trace metal determinations , 1985 .

[14]  Roberto Kawakami Harrop Galvão,et al.  Cross-validation for the selection of spectral variables using the successive projections algorithm , 2007 .

[15]  Luiz Alberto Pinto,et al.  Multi-core computation in chemometrics: case studies of voltammetric and NIR spectrometric analyses , 2010 .

[16]  K. Jetter,et al.  Principles and applications of wavelet transformation to chemometrics , 2000 .

[17]  Riccardo Leardi,et al.  Application of genetic algorithm–PLS for feature selection in spectral data sets , 2000 .

[18]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[19]  Roberto Kawakami Harrop Galvão,et al.  UV–Vis spectrometric classification of coffees by SPA–LDA , 2010 .

[20]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[21]  R. Leardi Genetic algorithms in chemometrics and chemistry: a review , 2001 .

[22]  Lutgarde M. C. Buydens,et al.  Improvement of PLS model transferability by robust wavelength selection , 1998 .

[23]  Marcelo Nascimento Martins,et al.  An application of subagging for the improvement of prediction accuracy of multivariate calibration models , 2006 .

[24]  Roberto Kawakami Harrop Galvão,et al.  Ensemble wavelet modelling for determination of wheat and gasoline properties by near and middle infrared spectroscopy. , 2010, Analytica chimica acta.

[25]  Roberto Kawakami Harrop Galvão,et al.  NIR spectrometric determination of quality parameters in vegetable oils using iPLS and variable selection , 2008 .

[26]  Maria Fernanda Pimentel,et al.  A solution to the wavelet transform optimization problem in multicomponent analysis , 2003 .

[27]  Roberto Kawakami Harrop Galvão,et al.  A variable elimination method to improve the parsimony of MLR models using the successive projections algorithm , 2008 .

[28]  Roberto Kawakami Harrop Galvão,et al.  Simultaneous spectrometric determination of Cu2+, Mn2+ and Zn2+ in polivitaminic/polimineral drug using SPA and GA algorithms for variable selection , 2005 .