The equivalence of partial least squares and principal component regression in the sufficient dimension reduction framework

Abstract Partial least squares (PLS) and principal component regression (PCR) are two widely used techniques for dimension reduction in chemometrics. However, the relationship between PLS and PCR is not entirely understood. In this paper, we introduce the idea of sufficient dimension reduction (SDR) to chemometrics, and show that PLS and PCR are methods of SDR. Furthermore, this paper shows that these two methods are equivalent within the framework of SDR which means that there is no theoretical advantage of PLS over PCR in terms of prediction performance. The above conclusion is supported by the results of a simulated dataset and three real datasets.

[1]  T. Næs,et al.  Principal component regression in NIR analysis: Viewpoints, background details and selection of components , 1988 .

[2]  I. Helland ON THE STRUCTURE OF PARTIAL LEAST SQUARES REGRESSION , 1988 .

[3]  Yi-Zeng Liang,et al.  Monte Carlo cross‐validation for selecting a model and estimating the prediction error in multivariate calibration , 2004 .

[4]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[5]  M. Stone Continuum regression: Cross-validated sequentially constructed prediction embracing ordinary least s , 1990 .

[6]  Bing Li,et al.  ENVELOPE MODELS FOR PARSIMONIOUS AND EFFICIENT MULTIVARIATE LINEAR REGRESSION , 2010 .

[7]  Inge S. Helland,et al.  Relevant components in regression , 1993 .

[8]  Runze Li,et al.  Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery , 2006, math/0602133.

[9]  Liping Zhu,et al.  On distribution‐weighted partial least squares with diverging number of highly correlated predictors , 2009 .

[10]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[11]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[12]  Lexin Li,et al.  Survival prediction of diffuse large-B-cell lymphoma based on both clinical and gene expression information , 2006, Bioinform..

[13]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[14]  Peter D. Wentzell,et al.  Comparison of principal components regression and partial least squares regression through generic simulations of complex mixtures , 2003 .

[15]  R. Cook,et al.  Partial inverse regression , 2007 .

[16]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[17]  Inge S. Helland,et al.  Envelopes and partial least squares regression , 2013 .

[18]  Ker-Chau Li Sliced inverse regression for dimension reduction (with discussion) , 1991 .

[19]  M. L. Eaton A characterization of spherical distributions , 1986 .

[20]  D. Rosen PLS, Linear Models and Invariant Spaces , 1994 .

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Prasad A. Naik,et al.  Partial least squares estimator for single‐index models , 2000 .

[23]  Qing-Song Xu,et al.  Generalized PLS regression , 2001 .

[24]  I. Helland Some theoretical aspects of partial least squares regression , 2001 .

[25]  D. Cox,et al.  Notes on Some Aspects of Regression Analysis , 1968 .

[26]  Yi-Zeng Liang,et al.  Monte Carlo cross validation , 2001 .

[27]  S. Weisberg,et al.  Comments on "Sliced inverse regression for dimension reduction" by K. C. Li , 1991 .

[28]  Lexin Li,et al.  Biological pathway selection through nonlinear dimension reduction. , 2011, Biostatistics.

[29]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[30]  J. Kalivas Two data sets of near infrared spectra , 1997 .

[31]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[32]  I. Helland,et al.  Comparison of Prediction Methods when Only a Few Components are Relevant , 1994 .

[33]  Dong-Sheng Cao,et al.  A new strategy to prevent over-fitting in partial least squares models based on model population analysis. , 2015, Analytica chimica acta.

[34]  R. H. Moore,et al.  Regression Graphics: Ideas for Studying Regressions Through Graphics , 1998, Technometrics.

[35]  F. Chiaromonte,et al.  Dimension reduction strategies for analyzing global gene expression data with a response. , 2002, Mathematical biosciences.

[36]  G. Irwin,et al.  Dynamic inferential estimation using principal components regression (PCR) , 1998 .

[37]  Xiaotong Shen,et al.  high-dimensional data analysis , 1991 .

[38]  Suchendra M. Bhandarkar,et al.  Saveface and Sirface: appearance-based recognition of faces and facial expressions , 2005, IEEE International Conference on Image Processing 2005.

[39]  Anders Björkström,et al.  A Generalized View on Continuum Regression , 1999 .

[40]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[41]  Hongzhe Li,et al.  Dimension reduction methods for microarrays with application to censored survival data , 2004, Bioinform..

[42]  Jiongqi Wang,et al.  A unified framework for contrast research of the latent variable multivariate regression methods , 2015 .

[43]  I. Helland Partial least squares regression and statistical models , 1990 .

[44]  Lunzhao Yi,et al.  A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling. , 2014, The Analyst.

[45]  S. D. Jong PLS fits closer than PCR , 1993 .

[46]  Yong-Huan Yun,et al.  A new method for wavelength interval selection that intelligently optimizes the locations, widths and combinations of the intervals. , 2015, The Analyst.

[47]  Qing-Song Xu,et al.  Uncover the path from PCR to PLS via elastic component regression , 2010 .

[48]  Ker-Chau Li,et al.  On almost Linearity of Low Dimensional Projections from High Dimensional Data , 1993 .

[49]  I. Helland Maximum likelihood regression on relevant components , 1992 .

[50]  C. Tian Density Functional Investigation of the Electronic Structures of Some Transition Metal Magnetic Solids and Statistical Methods on Drug Discovery. , 2011 .