Principal components transform-partial least squares: a novel method to accelerate cross-validation in PLS regression

Abstract This work proposes a new approach for building PLS regression models, Principal Components Transform-PLS (PCT-PLS), which is based on a full eigen decomposition (NIPALS) of the X matrix before proceeding to the PLS regression. This method dramatically accelerates the cross-validation of the calibration models and is at the same time parsimonious in computer memory requirements. This is most noticeable for the huge data sets that are common nowadays. This new approach preserves all the PLS modeling properties, such as robustness and regression vector interpretability, thus facilitating the application of this new procedure to building calibration models. The proposed technique will allow the application of PLS modeling to much larger data sets than was previously feasible.

[1]  Michael A. Malcolm,et al.  Computer methods for mathematical computations , 1977 .

[2]  S. Wold,et al.  Some recent developments in PLS modeling , 2001 .

[3]  Harald Martens,et al.  Reliable and relevant modelling of real world data: a personal account of the development of PLS Regression , 2001 .

[4]  S. Wold,et al.  Multi‐way principal components‐and PLS‐analysis , 1987 .

[5]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[6]  J. Kalivas Two data sets of near infrared spectra , 1997 .

[7]  S. Wold,et al.  A PLS kernel algorithm for data sets with many variables and few objects. Part II: Cross‐validation, missing data and examples , 1995 .

[8]  R. Manne Analysis of two partial-least-squares algorithms for multivariate calibration , 1987 .

[9]  Torbjörn Lundstedt,et al.  Hierarchical principal component analysis (PCA) and projection to latent structure (PLS) technique on spectroscopic data as a data pretreatment for calibration , 2001 .

[10]  Shin-ichi Sasaki,et al.  Comments on the NIPALS algorithm , 1990 .

[11]  B. M. Wise,et al.  Canonical partial least squares and continuum power regression , 2001 .

[12]  S. Wold,et al.  The kernel algorithm for PLS , 1993 .

[13]  Desire L. Massart,et al.  Kernel-PCA algorithms for wide data Part II: Fast cross-validation and application in classification of NIR data , 1997 .

[14]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[15]  Bhupinder S. Dayal,et al.  Improved PLS algorithms , 1997 .

[16]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[17]  Johanna Smeyers-Verbeke,et al.  Handbook of Chemometrics and Qualimetrics: Part A , 1997 .

[18]  Anders Berglund,et al.  A serial extension of multiblock PLS , 1999 .

[19]  Douglas N. Rutledge,et al.  Determination of the degree of methylesterification of pectic polysaccharides by FT-IR using an outer product PLS1 regression , 2002 .

[20]  S. D. Jong,et al.  The kernel PCA algorithms for wide data. Part I: Theory and algorithms , 1997 .

[21]  R. Paolesse,et al.  Outer product analysis of electronic nose and visible spectra: application to the measurement of peach fruit characteristics , 2002 .

[22]  Anders Berglund,et al.  New and old trends in chemometrics. How to deal with the increasing data volumes in R&D&P (research, development and production)—with examples from pharmaceutical research and process modeling , 2002 .

[23]  António S. Barros,et al.  Variability of cork from Portuguese Quercus suber studied by solid-state (13)C-NMR and FTIR spectroscopies. , 2001, Biopolymers.

[24]  Svante Wold,et al.  Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection , 1996 .

[25]  R. Manne,et al.  Fast regression methods in a Lanczos (or PLS-1) basis. Theory and applications , 2000 .

[26]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[27]  I. Helland ON THE STRUCTURE OF PARTIAL LEAST SQUARES REGRESSION , 1988 .

[28]  Douglas N. Rutledge,et al.  PoLiSh — smoothed partial least-squares regression , 2001 .

[29]  I. Helland Partial least squares regression and statistical models , 1990 .

[30]  Peter S. Belton,et al.  Advances in Magnetic Resonance in Food Science , 1999 .

[31]  Douglas N. Rutledge,et al.  Relations between Mid-Infrared and Near-Infrared Spectra Detected by Analysis of Variance of an Intervariable Data Matrix , 1997 .

[32]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .