A test of significance for partial least squares regression

Partial least squares (PLS) regression is a commonly used statistical technique for performing multivariate calibration, especially in situations where there are more variables than samples. Choosing the number of factors to include in a model is a decision that all users of PLS must make, but is complicated by the large number of empirical tests available. In most instances predictive ability is the most desired property of a PLS model and so interest has centred on making this choice based on an internal validation process. A popular approach is the calculation of a cross‐validated r2 to gauge how much variance in the dependent variable can be explained from leave‐one‐out predictions. Using Monte Carlo simulations for different sizes of data set, the influence of chance effects on the cross‐validation process is investigated. The results are presented as tables of critical values which are compared against the values of cross‐validated r2 obtained from the user's own data set. This gives a formal test for predictive ability of a PLS model with a given number of dimensions.

[1]  D. E. Patterson,et al.  Crossvalidation, Bootstrapping, and Partial Least Squares Compared with Multiple Regression in Conventional QSAR Studies , 1988 .

[2]  D. W. Osten,et al.  Selection of optimal regression models via cross‐validation , 1988 .

[3]  J. Topliss,et al.  Chance factors in studies of quantitative structure-activity relationships. , 1979, Journal of medicinal chemistry.

[4]  Monique M. Raats,et al.  A NEW SIGNIFICANCE TEST FOR CONSENSUS IN GENERALIZED PROCRUSTES ANALYSIS , 1992 .

[5]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[6]  S. Wold Validation of QSAR's , 1991 .

[7]  Johann Gasteiger,et al.  Multivariate structure‐activity relationships between data from a battery of biological tests and an ensemble of structure descriptors: The PLS method , 1984 .

[8]  P. Geladi Notes on the history and nature of partial least squares (PLS) modelling , 1988 .

[9]  S. Wold,et al.  The multivariate calibration problem in chemistry solved by the PLS method , 1983 .

[10]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[11]  I. Helland ON THE STRUCTURE OF PARTIAL LEAST SQUARES REGRESSION , 1988 .

[12]  W. Krzanowski,et al.  Cross-Validatory Choice of the Number of Components From a Principal Component Analysis , 1982 .

[13]  S. Wold,et al.  Source contributions to ambient aerosol calculated by discriminat partial least squares regression (PLS) , 1988 .

[14]  E. V. Thomas,et al.  Partial least-squares methods for spectral analyses. 1. Relation to other quantitative calibration methods and the extraction of qualitative information , 1988 .

[15]  A. Höskuldsson PLS regression methods , 1988 .

[16]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[17]  S. Wold,et al.  Partial least squares analysis with cross‐validation for the two‐class problem: A Monte Carlo study , 1987 .

[18]  Elise de Doncker,et al.  D01 Chapter-Numerical Algorithms Group, in samenwerking met de andere D01-contributors. 1) NAG Fortran Mini Manual, Mark 8, D01 18p., , 1981 .