论文信息 - How to avoid over-fitting in multivariate calibration--the conventional validation approach and an alternative. - 字舞流文

How to avoid over-fitting in multivariate calibration--the conventional validation approach and an alternative.

This paper critically reviews the problem of over-fitting in multivariate calibration and the conventional validation-based approach to avoid it. It proposes a randomization test that enables one to assess the statistical significance of each component that enters the model. This alternative is compared with cross-validation and independent test set validation for the calibration of a near-infrared spectral data set using partial least squares (PLS) regression. The results indicate that the alternative approach is more objective, since, unlike the validation-based approach, it does not require the use of 'soft' decision rules. The alternative approach therefore appears to be a useful addition to the chemometrician's toolbox.

N. M. Faber | R. Rajkó | N M Faber | R Rajkó | N. Faber

[1] S. Wold,et al. Orthogonal signal correction of near-infrared spectra , 1998 .

[2] E. V. Thomas,et al. Non‐parametric statistical methods for multivariate calibration model selection and comparison , 2003 .

[3] Nicolaas M. Faber,et al. Estimating the uncertainty in estimates of root mean square error of prediction: application to determining the size of an adequate test set in multivariate calibration , 1999 .

[4] Hilko van der Voet,et al. Comparing the predictive accuracy of models using a simple randomization test , 1994 .

[5] A. Savitzky,et al. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[6] Rocco DiFoggio,et al. Examination of Some Misconceptions about Near-Infrared Analysis , 1995 .

[7] N. M. Faber,et al. Uncertainty estimation and figures of merit for multivariate calibration (IUPAC Technical Report) , 2006 .

[8] Yi-Zeng Liang,et al. Monte Carlo cross validation , 2001 .

[9] Israel Schechter,et al. Wavelength Selection for Simultaneous Spectroscopic Analysis. Experimental and Theoretical Study , 1996 .

[10] M. P. Gómez-Carracedo,et al. Selecting the optimum number of partial least squares components for the calibration of attenuated total reflectance-mid-infrared spectra of undesigned kerosene samples. , 2007, Analytica chimica acta.

[11] J. Leroy Folks,et al. The Inverse Gaussian Distribution: Theory: Methodology, and Applications , 1988 .

[12] S. Wold,et al. The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[13] Yu-Long Xie,et al. Evaluation of principal component selection methods to form a global prediction model by principal component regression , 1997 .

[14] Stephen R. Delwiche,et al. SAS® Partial Least Squares Regression for Analysis of Spectroscopic Data , 2003 .

[15] Pierre Dardenne,et al. Validation and verification of regression in small data sets , 1998 .

[16] H. R. Keller,et al. Evolving factor analysis in the presence of heteroscedastic noise , 1992 .

[17] Hein Putter,et al. The bootstrap: a tutorial , 2000 .

[18] Desire L. Massart,et al. Estimation of partial least squares regression prediction uncertainty when the reference values carry a sizeable measurement error , 2003 .

[19] Avraham Lorber,et al. Alternatives to Cross-Validatory Estimation of the Number of Factors in Multivariate Calibration , 1990 .

[20] Michael C. Denham,et al. Choosing the number of factors in partial least squares regression: estimating and minimizing the mean squared error of prediction , 2000 .

[21] U Depczynski,et al. Genetic algorithms applied to the selection of factors in principal component regression , 2000 .

[22] J. Kalivas,et al. Local prediction models by principal component regression , 1997 .