Principles of Proper Validation: use and abuse of re‐sampling for validation

Validation in chemometrics is presented using the exemplar context of multivariate calibration/prediction. A phenomenological analysis of common validation practices in data analysis and chemometrics leads to formulation of a set of generic Principles of Proper Validation (PPV), which is based on a set of characterizing distinctions: (i) Validation cannot be understood by focusing on the methods of validation only; validation must be based on full knowledge of the underlying definitions, objectives, methods, effects and consequences—which are all outlined and discussed here. (ii) Analysis of proper validation objectives implies that there is one valid paradigm only: test set validation. (iii) Contrary to much contemporary chemometric practices (and validation myths), cross‐validation is shown to be unjustified in the form of monolithic application of a one‐for‐all procedure (segmented cross‐validation) on all data sets. Within its own design and scope, cross‐validation is in reality a sub‐optimal simulation of test set validation, crippled by a critical sampling variance omission, as it manifestly is based on one data set only (training data set). Other re‐sampling validation methods are shown to suffer from the same deficiencies. The PPV are universal and can be applied to all situations in which the assessment of performance is desired: prediction‐, classification‐, time series forecasting‐, modeling validation. The key element of PPV is the Theory of Sampling (TOS), which allow insight into all variance generating factors, especially the so‐called incorrect sampling errors, which, if not properly eliminated, are responsible for a fatal inconstant sampling bias, for which no statistical correction is possible. In the light of TOS it is shown how a second data set (test set, validation set) is critically necessary for the inclusion of the sampling errors incurred in all ‘future’ situations in which the validated model must perform. Logically, therefore, all one data set re‐sampling approaches for validation, especially cross‐validation and leverage‐corrected validation, should be terminated, or at the very least used only with full scientific understanding and disclosure of their detrimental variance omissions and consequences. Regarding PLS‐regression, an emphatic call is made for stringent commitment to test set validation based on graphical inspection of pertinent t–u plots for optimal understanding of the X–Y interrelationships and for validation guidance. QSAR/QSAP forms a partial exemption from the present test set imperative with no generalization potential. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  Knut Baumann,et al.  Validation tools for variable subset regression , 2004, J. Comput. Aided Mol. Des..

[2]  A. Bowman An alternative method of cross-validation for the smoothing of density estimates , 1984 .

[3]  Raghuraman Venkatapathy,et al.  Development of quantitative structure-activity relationship (QSAR) models to predict the carcinogenic potency of chemicals I. Alternative toxicity measures as an estimator of carcinogenic potency. , 2009, Toxicology and applied pharmacology.

[4]  Pentti Minkkinen,et al.  Evaluation of the fundamental sampling error in the sampling of particulate solids , 1987 .

[5]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[6]  Pierre Gy,et al.  Unbiased sampling from a falling stream of particulate material , 1978 .

[7]  Paolo Conti,et al.  Validation procedures in near-infrared spectrometry , 1994 .

[8]  B. Kowalski,et al.  Partial least squares solutions for multicomponent analysis , 1983 .

[9]  Kim H. Esbensen 50 years of Pierre Gy's “Theory of Sampling”—WCSB1: a tribute , 2004 .

[10]  Carlos Roberto de Souza Filho,et al.  Artificial neural networks applied to mineral potential mapping for copper‐gold mineralizations in the Carajás Mineral Province, Brazil , 2009 .

[11]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[12]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[13]  P. M. Gy Does your mechanical sampler do what it's supposed to , 1981 .

[14]  M. P. Gómez-Carracedo,et al.  Selecting the optimum number of partial least squares components for the calibration of attenuated total reflectance-mid-infrared spectra of undesigned kerosene samples. , 2007, Analytica chimica acta.

[15]  Hein Putter,et al.  The bootstrap: a tutorial , 2000 .

[16]  Kim H. Esbensen,et al.  Representative Sampling for reliable data analysis: Theory Of Sampling , 2005 .

[17]  M. Stone Asymptotics for and against cross-validation , 1977 .

[18]  Jarno Kohonen Advanced chemometric methods: Applicability on industrial data , 2009 .

[19]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[20]  B. Skagerberg,et al.  Predictive ability of regression models. Part I: Standard deviation of prediction errors (SDEP) , 1992 .

[21]  D. W. Osten,et al.  Selection of optimal regression models via cross‐validation , 1988 .

[22]  S. Larson The shrinkage of the coefficient of multiple correlation. , 1931 .

[23]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[24]  Michael C. Denham,et al.  Prediction intervals in partial least squares , 1997 .

[25]  Pierre Gy,et al.  Sampling of discrete materials. III. Quantitative approach: sampling of one-dimensional objects , 2004 .

[26]  S. Wold,et al.  A randomization test for PLS component selection , 2007 .

[27]  Marek Lankosz,et al.  Quantitative elemental analysis of individual particles with the use of micro-beam X-ray fluorescence method and Monte Carlo simulation , 2009 .

[28]  Kim H. Esbensen,et al.  Principles of Image Cross‐Validation (ICV): Representative Segmentation of Image Data Structures , 2007 .

[29]  Michael C. Denham,et al.  Choosing the number of factors in partial least squares regression: estimating and minimizing the mean squared error­ of prediction , 2000 .

[30]  Kim H. Esbensen,et al.  Representative mass reduction in sampling—a critical survey of techniques and hardware , 2004 .

[31]  P. Filzmoser,et al.  Repeated double cross validation , 2009 .

[32]  Kim H. Esbensen,et al.  Representative Sampling, Data Quality, Validation – A Necessary Trinity in Chemometrics , 2009 .

[33]  Pierre M. Gy,et al.  The analytical and economic importance of correctness in sampling , 1986 .

[34]  K. Gernaey,et al.  Good modeling practice for PAT applications: Propagation of input uncertainty and sensitivity analysis , 2009, Biotechnology progress.

[35]  R. Dennis Cook,et al.  Cross-Validation of Regression Models , 1984 .

[36]  R. Wehrens,et al.  Bootstrapping principal component regression models , 1997 .

[37]  M. Stone Cross-validation and multinomial prediction , 1974 .

[38]  Kim H. Esbensen,et al.  Representative process sampling for reliable data analysis—a tutorial , 2005 .

[39]  Pierre Dardenne,et al.  Validation and verification of regression in small data sets , 1998 .

[40]  S. Wold,et al.  Partial least squares analysis with cross‐validation for the two‐class problem: A Monte Carlo study , 1987 .

[41]  S. Geisser A predictive approach to the random effect model , 1974 .

[42]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[43]  Hilko van der Voet,et al.  Pseudo-degrees of freedom for complex predictive models: the example of partial least squares , 1999 .

[44]  I. Wakeling,et al.  A test of significance for partial least squares regression , 1993 .

[45]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[46]  Nicolaas (KLAAS) M. Faber,et al.  A closer look at the bias–variance trade‐off in multivariate calibration , 1999 .

[47]  G. Cruciani,et al.  Predictive ability of regression models. Part II: Selection of the best predictive PLS model , 1992 .

[48]  Eric Pirard,et al.  Segregation of the bulk blend fertilizers , 2004 .

[49]  Kim H. Esbensen,et al.  Representative process sampling — in practice: Variographic analysis and estimation of total sampling errors (TSE) , 2007 .

[50]  Anne Boomsma,et al.  Cross-Validation in Regression and Covariance Structure Analysis , 1992 .

[51]  Ker-Chau Li,et al.  From Stein's Unbiased Risk Estimates to the Method of Generalized Cross Validation , 1985 .

[52]  Pentti Minkkinen,et al.  Comparison of some methods to estimate the limiting value of the variogram, vh(j), for the sampling interval j=0 in sampling error estimation , 1997 .

[53]  E. V. Thomas,et al.  Partial least-squares methods for spectral analyses. 1. Relation to other quantitative calibration methods and the extraction of qualitative information , 1988 .

[54]  P. Burman A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods , 1989 .