A new universal resample-stable bootstrap-based stopping criterion for PLS component construction

We develop a new robust stopping criterion for partial least squares regression (PLSR) component construction, characterized by a high level of stability. This new criterion is universal since it is suitable both for PLSR and extensions to generalized linear regression (PLSGLR). The criterion is based on a non-parametric bootstrap technique and must be computed algorithmically. It allows the testing of each successive component at a preset significance level $$\alpha $$α. In order to assess its performance and robustness with respect to various noise levels, we perform dataset simulations in which there is a preset and known number of components. These simulations are carried out for datasets characterized both by $$n>p$$n>p, with n the number of subjects and p the number of covariates, as well as for $$n<p$$n<p. We then use t-tests to compare the predictive performance of our approach with other common criteria. The stability property is in particular tested through re-sampling processes on a real allelotyping dataset. An important additional conclusion is that this new criterion gives globally better predictive performances than existing ones in both the PLSR and PLSGLR (logistic and poisson) frameworks.

[1]  F. A. Coller,et al.  THE PROGNOSTIC SIGNIFICANCE OF DIRECT EXTENSION OF CARCINOMA OF THE COLON AND RECTUM , 1954, Annals of surgery.

[2]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[3]  R. Gentleman,et al.  Classification Using Generalized Partial Least Squares , 2005 .

[4]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[5]  H. Akaike A new look at the statistical model identification , 1974 .

[6]  A. Höskuldsson PLS regression methods , 1988 .

[7]  S. Wold,et al.  A randomization test for PLS component selection , 2007 .

[8]  Agnar Höskuldsson,et al.  Dimension of linear models , 1996 .

[9]  Wojtek J. Krzanowski,et al.  Cross-Validation in Principal Component Analysis , 1987 .

[10]  S. Wold,et al.  The multivariate calibration problem in chemistry solved by the PLS method , 1983 .

[11]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[12]  John H. Kalivas,et al.  Graphical diagnostics for regression model determinations with consideration of the bias/variance trade-off , 2002 .

[13]  V. E. Vinzi,et al.  Bootstrap-based Q̂kh2 for the selection of components and variables in PLS regression , 2003 .

[14]  Bjørn-Helge Mevik,et al.  Mean squared error of prediction (MSEP) estimates for principal component regression (PCR) and partial least squares regression (PLSR) , 2004 .

[15]  B. Minasny The Elements of Statistical Learning, Second Edition, Trevor Hastie, Robert Tishirani, Jerome Friedman. (2009), Springer Series in Statistics, ISBN 0172-7397, 745 pp , 2009 .

[16]  Michel Tenenhaus,et al.  PLS generalised linear regression , 2005, Comput. Stat. Data Anal..

[17]  Douglas N. Rutledge,et al.  An evaluation of the PoLiSh smoothed regression and the Monte Carlo Cross-Validation for the determination of the complexity of a PLS model , 2003 .

[18]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[19]  A. Boulesteix PLS Dimension Reduction for Classification with Microarray Data , 2004, Statistical applications in genetics and molecular biology.

[20]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[21]  D. W. Osten,et al.  Selection of optimal regression models via cross‐validation , 1988 .

[22]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[23]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[24]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[25]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[26]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[27]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[28]  B. Marx Iteratively reweighted partial least squares estimation for generalized linear regression , 1996 .

[29]  J. Fox Bootstrapping Regression Models , 2002 .

[30]  Michael C. Denham,et al.  Choosing the number of factors in partial least squares regression: estimating and minimizing the mean squared error­ of prediction , 2000 .

[31]  R. Manne Analysis of two partial-least-squares algorithms for multivariate calibration , 1987 .

[32]  P. Bachellier,et al.  Allelotyping analyses of synchronous primary and metastasis CIN colon cancers identified different subtypes , 2007, International journal of cancer.

[33]  F. Bertrand,et al.  Comparaison de variantes de régressions logistiques PLS et de régression PLS sur variables qualitatives : application aux données d'allélotypage , 2010 .

[34]  M. P. Gómez-Carracedo,et al.  Selecting the optimum number of partial least squares components for the calibration of attenuated total reflectance-mid-infrared spectra of undesigned kerosene samples. , 2007, Analytica chimica acta.

[35]  Elaine B. Martin,et al.  Model selection for partial least squares regression , 2002 .

[36]  R. Wehrens,et al.  Bootstrapping principal component regression models , 1997 .

[37]  Anne-Laure Boulesteix,et al.  Partial least squares: a versatile tool for the analysis of high-dimensional genomic data , 2006, Briefings Bioinform..

[38]  I. Wakeling,et al.  A test of significance for partial least squares regression , 1993 .

[39]  E. V. Thomas,et al.  Partial least-squares methods for spectral analyses. 1. Relation to other quantitative calibration methods and the extraction of qualitative information , 1988 .

[40]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[41]  Hilko van der Voet,et al.  Comparing the predictive accuracy of models using a simple randomization test , 1994 .

[42]  Masashi Sugiyama,et al.  The Degrees of Freedom of Partial Least Squares Regression , 2010, 1002.4112.