A method for validation of reference sets in SIMCA modelling

A method for validation of the reference set in Soft Independent Modelling of Class Analogies (SIMCA) is proposed. The reference set is used to build the SIMCA model and the remaining samples are fitted to this model. Thus, it is important that the reference set is representative for the reference class. In this work it is suggested that the reference set can be validated by the jackknife procedure. The jackknife estimate of standard error for the reference set is determined by successively leaving one sample out. It is proposed that the standard error should be minimised for an optimal reference set. Minimisation of the standard error should be balanced with the loss of variation span for the reference set to avoid a too narrow reference class. The reference sets are optimised by changing the composition of the reference set. The suggested validation method is tested on two data sets from environmental monitoring surveys of oil fields in the North Sea.

[1]  Yizeng Liang,et al.  Preprocessing of analytical profiles in the presence of homoscedastic or heteroscedastic noise , 1994 .

[2]  G. Quinn,et al.  Experimental Design and Data Analysis for Biologists , 2002 .

[3]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[4]  F. Westad Relevance and Parsimony in Multivariate Modelling , 1999 .

[5]  G. K. Bhattacharyya,et al.  Statistical Concepts And Methods , 1978 .

[6]  J. S. Gray,et al.  A comprehensive analysis of the effects of offshore oil and gas exploration and production on the benthic communities of the Norwegian continental shelf , 1995 .

[7]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[8]  Projective ordination by SIMCA: A dynamic strategy for cost-efficient environmental monitoring around offshore installations , 1996, Aquatic Sciences.

[9]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[10]  Svante Wold,et al.  Pattern recognition by means of disjoint principal components models , 1976, Pattern Recognit..

[11]  John S. Gray,et al.  Detection Of Initial Effects Of Pollution On Marine Benthos - An Example From The Ekofisk And Eldfisk Oilfields North-Sea , 1990 .

[12]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[13]  Reducing and quantifying uncertainty for pollution estimates calculated by modelling replicated benthic count data , 2002 .

[14]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[15]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .