DOUBLE-CASE DIAGNOSTIC FOR OUTLIERS IDENTIFICATION

Abstract The issue of calibration model robustness was the stimulus toward the implementation of the jackknife estimator. The work reported illustrates the theory and concepts of the jackknife estimator. The objectives of the work were to exploit the estimator to produce more robust estimates of regression models by, for example, providing an estimate of model bias and subsequently using the estimated bias to correct the model. Also, to evaluate the potential of the jackknife for the estimation of multivariate regression models. Details are given in this publication of the program written, implementing a double jack-knife estimator that was applied to multivariate calibration models derived from selected data sets. The data sets were used to test, validate and evaluate the jackknife estimator routine developed. The data sets have been taken from published literature and also include a ‘real world’ data set. Partial least squares (PLS) regression was selected as the method of choice for the data sets to be tested but other least squares regression models were employed for comparison. The main conclusions from this study are that the double-jackknife procedure leads to over-conservative estimates of the precision of the PLS predictions. However, it can serve as a useful tool for highlighting the presence of outlier objects in a data set.

[1]  Tapon Roy,et al.  Bootstrap accuracy for non‐linear regression models , 1994 .

[2]  L. E. Wangen,et al.  A theoretical foundation for the PLS algorithm , 1987 .

[3]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[4]  Tormod Næs,et al.  Multivariate Calibration When the Error Covariance Matrix Is Structured , 1985 .

[5]  B. Kowalski,et al.  A Note on the Use of the Partial Least-Squares Method for Multivariate Calibration , 1988 .

[6]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[7]  C. Braak,et al.  Prediction error in partial least squares regression: a critique on the deviation used in The Unscrambler , 1995 .

[8]  M. H. Quenouille NOTES ON BIAS IN ESTIMATION , 1956 .

[9]  P. Rousseeuw Tutorial to robust statistics , 1991 .

[10]  F. M.,et al.  The Concise Oxford Dictionary of Current English , 1929, Nature.

[11]  Bruce R. Kowalski,et al.  Recent developments in multivariate calibration , 1991 .

[12]  T. A. Bancroft,et al.  Research papers in statistics , 1966 .

[13]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[14]  D. W. Osten,et al.  Selection of optimal regression models via cross‐validation , 1988 .

[15]  H. L. Gray,et al.  The Generalised Jackknife Statistic , 1974 .

[16]  P. Geladi Notes on the history and nature of partial least squares (PLS) modelling , 1988 .

[17]  A. Höskuldsson PLS regression methods , 1988 .

[18]  Estimation of uncertainty in multivariate vibrational spectroscopy , 1995 .