Comparison of the variable importance in projection (VIP) and of the selectivity ratio (SR) methods for variable selection and interpretation

This study compares the application of two variable selection methods in partial least squares regression (PLSR), the variable importance in projection (VIP) method and the selectivity ratio (SR) method. For this purpose, three different data sets were analysed: (a) physiochemical water quality parameters related to sensorial data, (b) gas chromatography–mass spectrometry (GC‐MS) chemical (organic compound) profiles from fossil sea sediment samples related to sea surface temperature (SST) changes, and (c) exposed genes of Daphnia magna female samples related to their total offspring production. Correlation coefficients (r), levels of significance (p‐value) and interpretation of the underlying experimental phenomena allowed the discussion about the best approach for variable selection in each case. The comparison of the two variable selection methods in the first water quality data set showed that the SR method is more accurate for sensorial prediction. For the climate data set, when raw total ion current (TIC) GC‐MS chromatograms were considered, variables selected using the VIP method were easier to interpret compared with those selected by the SR method. However, when only some chromatographic peak areas (concentrations) were considered, the SR method was more efficient for prediction, and the VIP method selected the most relevant variables for the interpretation of SST changes. Finally, for the transcriptomic data set, the SR method was found again to be more reliable for prediction purposes. Copyright © 2015 John Wiley & Sons, Ltd.

[1]  R. Tauler,et al.  Extraction of climatic signals from fossil organic compounds in marine sediments up to 11.7 Ma old (IODP-U1318). , 2015, Analytica chimica acta.

[2]  Petri Parvinen,et al.  Determinants of New Product Launch Success in the Pharmaceutical Industry , 2015, Journal of Pharmaceutical Innovation.

[3]  E. Pagliano Solution to the isotope dilution challenge , 2015, Analytical and Bioanalytical Chemistry.

[4]  Ivana Stanimirova,et al.  Detection of discoloration in diesel fuel based on gas chromatographic fingerprints , 2014, Analytical and Bioanalytical Chemistry.

[5]  Lutgarde M. C. Buydens,et al.  Interpretation of variable importance in Partial Least Squares with Significance Multivariate Correlation (sMC) , 2014 .

[6]  Johan Trygg,et al.  Variable influence on projection (VIP) for orthogonal projections to latent structures (OPLS) , 2014 .

[7]  S. Tsakovski,et al.  Identification of metabolic pathways in Daphnia magna explaining hormetic effects of selective serotonin reuptake inhibitors and 4-nonylphenol using transcriptomic and phenotypic responses. , 2013, Environmental science & technology.

[8]  Bahram Hemmateenejad,et al.  Identification of discriminatory variables in proteomics data analysis by clustering of variables. , 2013, Analytica chimica acta.

[9]  R. Tauler,et al.  Influence of minerals on the taste of bottled and tap water: a chemometric approach. , 2013, Water research.

[10]  Silvia Serranti,et al.  Classification of oat and groat kernels using NIR hyperspectral imaging. , 2013, Talanta.

[11]  L. Buydens,et al.  Predictive-property-ranked variable reduction in partial least squares modelling with final complexity adapted models: comparison of properties for ranking. , 2013, Analytica chimica acta.

[12]  Tahir Mehmood,et al.  A review of variable selection methods in Partial Least Squares Regression , 2012 .

[13]  Objective chemical fingerprinting of oil spills by partial least-squares discriminant analysis , 2012, Analytical and Bioanalytical Chemistry.

[14]  O. Kvalheim,et al.  Chromatographic profiling and multivariate analysis for screening and quantifying the contributions from individual components to the bioactive signature in natural products , 2011 .

[15]  Rasmus Bro,et al.  Variable selection in regression—a tutorial , 2010 .

[16]  O. Kvalheim,et al.  A multivariate approach to reveal biomarker signatures for disease classification: application to mass spectral profiles of cerebrospinal fluid from patients with multiple sclerosis. , 2010, Journal of proteome research.

[17]  Tarja Rajalahti,et al.  Discriminating variable test and selectivity ratio plot: quantitative tools for interpretation and variable (biomarker) selection in complex spectral or chromatographic profiles. , 2009, Analytical chemistry.

[18]  O. Kvalheim,et al.  Biomarker discovery in mass spectral profiles by means of selectivity ratio plot , 2009 .

[19]  R Devesa,et al.  Contribution of the FPA tasting panel to decision making about drinking water treatment facilities. , 2007, Water science and technology : a journal of the International Association on Water Pollution Research.

[20]  Peng Gao,et al.  High performance liquid chromatography-mass spectrometry for metabonomics: potential biomarkers for acute deterioration of liver function in chronic hepatitis B. , 2006, Journal of proteome research.

[21]  H. Bojar,et al.  Predictors of primary breast cancers responsiveness to preoperative Epirubicin/Cyclophosphamide-based chemotherapy: translation of microarray data into clinically useful predictive signatures , 2005, Journal of Translational Medicine.

[22]  C. Jun,et al.  Performance of some variable selection methods when multicollinearity is present , 2005 .

[23]  Erik Johansson,et al.  Time-resolved QSAR: an approach to PLS modelling of three-way biological data , 2004 .

[24]  N. Salvatella,et al.  The panel of Aigües de Barcelona: 15 years of history. , 2004, Water science and technology : a journal of the International Association on Water Pollution Research.

[25]  Erik Johansson,et al.  Multivariate analysis of aquatic toxicity data with PLS , 1995, Aquatic Sciences.

[26]  Claus A. Andersson,et al.  Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data , 2004 .

[27]  P. Eilers A perfect smoother. , 2003, Analytical chemistry.

[28]  M. Tenenhaus,et al.  Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach , 2003, Human Genetics.

[29]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[30]  A. Rosell‐Melé,et al.  Calibration of the alkenone paleotemperature index U37K′ based on core-tops from the eastern South Atlantic and the global ocean (60°N-60°S) , 1998 .

[31]  J. Carstensen,et al.  Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping , 1998 .

[32]  C. Pelejero,et al.  Clean-up procedures for the unbiased estimation of C37 alkenone sea surface temperatures and terrigenous n-alkane inputs in paleoceanography , 1997 .

[33]  J. Grimalt,et al.  Pitfalls in the chromatographic determination of the alkenone U37k index for paleotemperature estimation , 1996 .

[34]  S. Wold,et al.  PLS: Partial Least Squares Projections to Latent Structures , 1993 .

[35]  T. Næs,et al.  Principal component regression in NIR analysis: Viewpoints, background details and selection of components , 1988 .

[36]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[37]  G. J. Perry,et al.  Microbial lipids of an intertidal sediment—I. Fatty acids and hydrocarbons , 1980 .

[38]  M. Blumer,et al.  Hydrocarbons of marine phytoplankton , 1971 .