The successive projections algorithm for interval selection in PLS

article i nfo The successive projections algorithm (SPA) is aimed at selecting a subset of variables with small multi- collinearity and suitable prediction power for use in Multiple Linear Regression (MLR). The resulting SPA-MLR models have advantages in terms of simplicity and ease of interpretation as compared to latent-variable models, such as Partial-Least-Squares (PLS). However, PLS tends to be less sensitive to instru- mental noise. The present paper proposes an extension of SPA to combine the noise-reduction properties of PLS with the possibility of discarding non-informative variables in SPA. For this purpose, SPA is modified in order to select intervals of variables for use in PLS. The proposed iSPA-PLS algorithm is evaluated in two case studies involving near-infrared spectrometric analysis of wheat and beer extract samples. As compared to full-spectrum PLS, the resulting iSPA-PLS models exhibited better performance in terms of both cross-validation and external prediction. On the other hand, iSPA-PLS and SPA-MLR presented similar cross-validation performance, but the iSPA-PLS models clearly outperformed SPA-MLR in the external pre- diction. Such results indicate thatiSPA-PLS may be more robust with respect to differences between the ex- ternal prediction set and the calibration set used in the cross-validation procedure.

[1]  Maria Fernanda Pimentel,et al.  Infrared spectroscopy and multivariate calibration to monitor stability quality parameters of biodiesel , 2010 .

[2]  Roberto Kawakami Harrop Galvão,et al.  A method for calibration and validation subset partitioning. , 2005, Talanta.

[3]  R. Teófilo,et al.  Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression , 2009 .

[4]  María S. Di Nezio,et al.  Successive projections algorithm improving the multivariate simultaneous direct spectrophotometric determination of five phenolic compounds in sea water , 2007 .

[5]  Ron Wehrens,et al.  Wavelength selection with Tabu Search , 2003 .

[6]  Flow-Injection Simultaneous Chemiluminescence Determination of Ascorbic Acid and L-Cysteine with Partial Least Squares Calibration , 2005 .

[7]  R. Leardi,et al.  Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions , 2004 .

[8]  E. V. Thomas,et al.  Partial least-squares methods for spectral analyses. 1. Relation to other quantitative calibration methods and the extraction of qualitative information , 1988 .

[9]  K. Fujiwara,et al.  Input variable selection for PLS modeling using nearest correlation spectral clustering , 2012 .

[10]  Celio Pasquini,et al.  Determination of total sulfur in diesel fuel employing NIR spectroscopy and multivariate calibration. , 2003, The Analyst.

[11]  M. Forina,et al.  Iterative predictor weighting (IPW) PLS: a technique for the elimination of useless predictors in regression problems , 1999 .

[12]  Maria Fernanda Pimentel,et al.  Robust modeling for multivariate calibration transfer by the successive projections algorithm , 2005 .

[13]  Bahram Hemmateenejad,et al.  Ant colony optimisation: a powerful tool for wavelength selection , 2006 .

[14]  Roman M. Balabin,et al.  Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. , 2011, Analytica chimica acta.

[15]  Maria Fernanda Pimentel,et al.  Aspects of the successive projections algorithm for variable selection in multivariate calibration applied to plasma emission spectrometry , 2001 .

[16]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[17]  Mahdi Ghasemi-Varnamkhasti,et al.  Screening analysis of beer ageing using near infrared spectroscopy and the Successive Projections Algorithm for variable selection. , 2012, Talanta.

[18]  Zou Xiaobo,et al.  Variables selection methods in near-infrared spectroscopy. , 2010, Analytica chimica acta.

[19]  Vincent Baeten,et al.  A Backward Variable Selection method for PLS regression (BVSPLS). , 2009, Analytica chimica acta.

[20]  Celio Pasquini,et al.  A strategy for selecting calibration samples for multivariate modelling , 2004 .

[21]  Rasmus Bro,et al.  Variable selection in regression—a tutorial , 2010 .

[22]  Franco Allegrini,et al.  A new and efficient variable selection algorithm based on ant colony optimization. Applications to near infrared spectroscopy/partial least-squares analysis. , 2011, Analytica chimica acta.

[23]  Richard G. Brereton,et al.  Introduction to multivariate calibration in analytical chemistry , 2000 .

[24]  Jian-hui Jiang,et al.  Spectral regions selection to improve prediction ability of PLS models by changeable size moving window partial least squares and searching combination moving window partial least squares , 2004 .

[25]  Roberto Kawakami Harrop Galvão,et al.  Near infrared reflectance spectrometry classification of cigarettes using the successive projections algorithm for variable selection. , 2009, Talanta.

[26]  Roberto Kawakami Harrop Galvão,et al.  A variable elimination method to improve the parsimony of MLR models using the successive projections algorithm , 2008 .

[27]  Tahir Mehmood,et al.  A review of variable selection methods in Partial Least Squares Regression , 2012 .

[28]  S. Lanteri,et al.  Selection of useful predictors in multivariate calibration , 2004, Analytical and bioanalytical chemistry.

[29]  Satoru Tsuchikawa,et al.  A Review of Recent Near Infrared Research for Wood and Paper , 2007 .

[30]  Roberto Kawakami Harrop Galvão,et al.  The successive projections algorithm , 2013 .

[31]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[32]  Riccardo Leardi,et al.  Application of genetic algorithm–PLS for feature selection in spectral data sets , 2000 .

[33]  H. Martens,et al.  Modified Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR) , 2000 .

[34]  Nasser Goudarzi,et al.  Application of successive projections algorithm (SPA) as a variable selection in a QSPR study to predict the octanol/water partition coefficients (Kow) of some halogenated organic compounds , 2010 .

[35]  A. A. Gomes,et al.  Determination of biodiesel content in biodiesel/diesel blends using NIR and visible spectroscopy with variable selection. , 2011, Talanta.

[36]  Roberto Kawakami Harrop Galvão,et al.  A graphical user interface for variable selection employing the Successive Projections Algorithm , 2012 .

[37]  Fang Cheng,et al.  On-line prediction of pH values in fresh pork using visible/near-infrared spectroscopy with wavelet de-noising and variable selection methods , 2012 .

[38]  Roberto Kawakami Harrop Galvão,et al.  Cross-validation for the selection of spectral variables using the successive projections algorithm , 2007 .

[39]  Roberto Kawakami Harrop Galvão,et al.  The successive projections algorithm for spectral variable selection in classification problems , 2005 .

[40]  Attenuated total reflectance with Fourier transform infrared spectroscopy (ATR/FTIR) and different PLS Algorithms for simultaneous determination of clavulanic acid and amoxicillin in powder pharmaceutical formulation , 2011 .

[41]  H. Martens,et al.  Variable Selection in near Infrared Spectroscopy Based on Significance Testing in Partial Least Squares Regression , 2000 .