Interpretation of variable importance in Partial Least Squares with Significance Multivariate Correlation (sMC)

Abstract Despite gaining popularity and success in many modeling applications, Partial Least Squares (PLS) regression continues to provide challenges in the evaluation of important variables. This article describes the relationship between the regression coefficients and orthogonally decomposed variances in PLS. The relation between prediction, model interpretation, and important variable determination is described using the theory of the basic sequence presented here as a special case of the famous Krylov sequence (or the power method). Variable selection methods e.g. Selectivity Ratio (SR) and Variable Importance in the Projection (VIP) are also described in this framework. We show that the interpretation can be affected by unnecessary rotation toward the main source of variance in the X -block. Significance Multivariate Correlation (sMC) is developed using the knowledge obtained from the basic sequence to minimize the effect of irrelevant X -structures. Simultaneously sMC highlights the variables most correlated to the response. The performance of sMC is demonstrated, using simulated and real datasets, against commonly used variable selection methods, such as the Variable Importance in the Projection and Selectivity Ratio.

[1]  Olav M. Kvalheim,et al.  Interpretation of partial least squares regression models by means of target projection and selectivity ratio plots , 2010 .

[2]  I. Helland ON THE STRUCTURE OF PARTIAL LEAST SQUARES REGRESSION , 1988 .

[3]  L. Buydens,et al.  Use of the bootstrap and permutation methods for a more robust variable importance in the projection metric for partial least squares regression. , 2013, Analytica chimica acta.

[4]  H. Martens,et al.  Modified Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR) , 2000 .

[5]  Tarja Rajalahti,et al.  Discriminating variable test and selectivity ratio plot: quantitative tools for interpretation and variable (biomarker) selection in complex spectral or chromatographic profiles. , 2009, Analytical chemistry.

[6]  S. Engelsen,et al.  Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy , 2000 .

[7]  A. Phatak,et al.  The geometry of partial least squares , 1997 .

[8]  Olav M. Kvalheim,et al.  Interpretation of latent-variable regression models , 1989 .

[9]  Tahir Mehmood,et al.  A review of variable selection methods in Partial Least Squares Regression , 2012 .

[10]  P. Padilla,et al.  Least-squares approximation of a space distribution for a given covariance and latent sub-space , 2011 .

[11]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[12]  O. Kvalheim,et al.  Biomarker discovery in mass spectral profiles by means of selectivity ratio plot , 2009 .

[13]  M. Dyrby,et al.  Chemometric Quantitation of the Active Substance (Containing C≡N) in a Pharmaceutical Tablet Using Near-Infrared (NIR) Transmittance and NIR FT-Raman Spectra , 2002 .

[14]  Age K. Smilde,et al.  Variable importance in latent variable regression models , 2014 .

[15]  Tom Fearn,et al.  On orthogonal signal correction , 2000 .

[16]  Johanna Smeyers-Verbeke,et al.  Handbook of Chemometrics and Qualimetrics: Part A , 1997 .

[17]  S. Wold,et al.  PLS: Partial Least Squares Projections to Latent Structures , 1993 .

[18]  Gary E. Ritchie,et al.  Assuring specificity for a multivariate near-infrared (NIR) calibration: the example of the Chambersburg Shoot-out 2002 data set. , 2008, Journal of pharmaceutical and biomedical analysis.

[19]  Tarja Rajalahti,et al.  X‐tended target projection (XTP)—comparison with orthogonal partial least squares (OPLS) and PLS post‐processing by similarity transformation (PLS + ST) , 2009 .