Chapter 7 SIMCA - Classification by Means of Disjoint Cross Validated Principal Components Models

Publisher Summary The classification method Soft Independent Modelling of Class Analogies (SIMCA) is such a method that each class of samples is described by its own principal component model. Thus, in principle, any degree of data collinearity can be accommodated by the models. The chapter presents with a discussion on the important role played by correlation when assessing similarity, and introduces the properties of principal component modelling of relevance to a classification problem. All basic concepts and steps in the SIMCA approach to supervised modelling are thoroughly explored using chemical data obtained in an environmental study. Definition of distance is central in all classification procedures. Euclidean distance in variable space is the most commonly used for measuring similarity between samples. This measure is presented in two-dimensional space. Principal component modelling plays two different roles in the classification of multivariate data, they are as follows: (1) it is a tool for data reduction to obtain low-dimensional orthogonal representations of the multivariate variable- and object-space in which object and variable relationships can be explored, (2) it is used in the SIMCA method to separate the data into a model and a residual matrix from which a scale can be obtained for later classification of samples. Sometimes SIMCA classification is preceded by an unsupervised principal component modelling of the whole data set. The process of detecting and deleting outliers represents one side of the process termed “polishing of classes.”

[2]  O. Kvalheim Oil—source correlation by the combined use of principal component modelling, analysis of variance and a coefficient of congruence , 1987 .

[3]  O. M. Kvalheim Interpretation of direct latent-variable projection methods and their aims and use in the analysis of multicomponent spectroscopic and chromatographic data , 1988 .

[4]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[5]  Jerome H. Friedman,et al.  Classification: Oldtimers and newcomers , 1989 .

[6]  N. B. Vogt Principal component variable discriminant plots: A novel approach for interpretation and analysis of multi‐class data , 1988 .

[7]  Hilko van der Voet,et al.  THE IMPROVEMENT OF SIMCA CLASSIFICATION BY USING KERNEL DENSITY-ESTIMATION .1. A NEW PROBABILISTIC CLASSIFICATION TECHNIQUE AND HOW TO EVALUATE SUCH A TECHNIQUE , 1984 .

[8]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[9]  Olav M. Kvalheim,et al.  A general-purpose program for multivariate data analysis , 1987 .

[10]  J. van der Greef,et al.  An evaluation of SIMCA. Part 2 — classification of pyrolysis mass spectra of pseudomonas and serratia bacteria by pattern recognition using the SIMCA classifier , 1987 .

[11]  H. Birks Multivariate analysis in geology and geochemistry: An introduction , 1987 .

[12]  Olav M. Kvalheim,et al.  SIMCA multivariate data analysis of blue mussel components in environmental pollution studies , 1983 .

[13]  Da Doornbos,et al.  THE IMPROVEMENT OF SIMCA CLASSIFICATION BY USING KERNEL DENSITY-ESTIMATION .2. PRACTICAL EVALUATION OF SIMCA, ALLOC AND CLASSY ON 3 DATA SETS , 1984 .

[14]  Svante Wold,et al.  Comments on a recent evaluation of the SIMCA method , 1987 .

[15]  Svante Wold,et al.  Pattern recognition by means of disjoint principal components models , 1976, Pattern Recognit..

[16]  Erik Johansson,et al.  Four levels of pattern recognition , 1978 .

[17]  Olav M. Kvalheim,et al.  Latent-structure decompositions (projections) of multivariate data , 1987 .

[18]  H. Van 'T Klooster,et al.  An evaluation of SIMCA. Part 1 — the reliability of the SIMCA pattern recognition method for a varying number of objects and features , 1987 .

[19]  Einar Sletten,et al.  Detection of malignant tumours by multivariate analysis of proton magnetic resonance spectra of serum. , 1990, European journal of cancer.

[20]  O. Kvalheim,et al.  Visualizing information in multivariate data: Applications to petroleum geochemistry : Part 2. Interpretation and correlation of north sea oils by using three different biomarker fractions , 1986 .

[21]  Olav M. Kvalheim,et al.  Routine analyses of crude oil fractions by principal component modelling of gas chromatographic profiles , 1986 .