Disjoint hard models for classification

The paper describes a new approach for disjoint hard modelling of classes. This involves developing independent PC models for each group in the class, and calculating both the Q statistic (square prediction error) for each sample to the class model and a separate statistic about how well samples are classified within the projected PC space. The latter statistic can be applied to different types of classifiers, in this paper we choose to illustrate by Quadratic Discriminant Analysis (D statistic) and one class Support Vector Domain Description (SVDD) (f‐value). The two measures (Q and the classifier dependent statistic) are combined into a joint decision function which uniquely classifies each sample. The disjoint hard models are contrasted to conjoint models where PCA is performed on the entire dataset using both QDA and Support Vector Machines (SVMs) classifiers. The optimum number of PCs for each model is determined using the bootstrap, and model performance assessed on 100 test sets obtained using different iterative splits, using %PA (Predictive Ability) and %CR (Classification Rate). The method is illustrated using a dataset consisting of 293 samples from nine groups of polymers obtained using thermal profiling. The approach described, in this paper, has many of the advantages of one class disjoint models (e.g. SIMCA) and of conventional hard models, and is useful if it is known that all samples must belong to one of a series of known groups but each group has a very different structure. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  R. Brereton,et al.  Support vector machines for classification and regression. , 2010, The Analyst.

[2]  Richard G. Brereton,et al.  Chemometrics for Pattern Recognition , 2009 .

[3]  Peter Filzmoser,et al.  Introduction to Multivariate Statistical Analysis in Chemometrics , 2009 .

[4]  R. Brereton,et al.  Comparison of performance of five common classifiers represented as boundary methods: Euclidean Distance to Centroids, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization and Support Vector Machines, as dependent on data structure , 2009 .

[5]  João A. Lopes,et al.  Uncertainty assessment in FT-IR spectroscopy based bacteria classification models , 2008 .

[6]  Richard G Brereton,et al.  Self Organising Maps for distinguishing polymer groups using thermal response curves obtained by dynamic mechanical analysis. , 2008, The Analyst.

[7]  K. Héberger,et al.  Supervised pattern recognition in food analysis. , 2007, Journal of chromatography. A.

[8]  Richard G. Brereton,et al.  Learning Vector Quantization for Multiclass Classification: Application to Characterization of Plastics , 2007, J. Chem. Inf. Model..

[9]  Richard G. Brereton,et al.  Pattern Recognition of Gas Chromatography Mass Spectrometry of Human Volatiles in Sweat to distinguish the sex of subjects and determine potential Discriminatory Marker Peaks , 2007 .

[10]  Richard G. Brereton,et al.  Pattern recognition and feature selection for the discrimination between grades of commercial plastics , 2007 .

[11]  Manuela Pavan,et al.  CAIMAN (Classification and Influence Matrix Analysis) : A new approach to the classification based on leverage-scaled functions , 2007 .

[12]  R. Brereton,et al.  Dynamic mechanical analysis and chemometrics for polymer identification , 2007 .

[13]  Richard G. Brereton,et al.  Applied Chemometrics for Scientists , 2007 .

[14]  Max Diem,et al.  Artificial neural networks as supervised techniques for FT‐IR microspectroscopic imaging , 2006, Journal of chemometrics.

[15]  R. Brereton,et al.  Pattern recognition for the analysis of polymeric materials. , 2006, The Analyst.

[16]  Yun Xu,et al.  Support Vector Machines: A Recent Method for Classification in Chemometrics , 2006 .

[17]  Yun Xu,et al.  Diagnostic Pattern Recognition on Gene-Expression Profile Data by Using One-Class Classification , 2005, J. Chem. Inf. Model..

[18]  Shigeo Abe,et al.  Support Vector Machines for Pattern Classification , 1999, Advances in Pattern Recognition.

[19]  J. Edward Jackson,et al.  A User's Guide to Principal Components: Jackson/User's Guide to Principal Components , 2004 .

[20]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[21]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[22]  Marco Vighi,et al.  QSAR in Ecotoxicity: An Overview of Modern Classification Techniques , 2004, J. Chem. Inf. Model..

[23]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[24]  S. Joe Qin,et al.  Statistical process monitoring: basics and beyond , 2003 .

[25]  Richard G. Brereton,et al.  Chemometrics: Data Analysis for the Laboratory and Chemical Plant , 2003 .

[26]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[27]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[28]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[29]  Hein Putter,et al.  The bootstrap: a tutorial , 2000 .

[30]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[31]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[32]  Matthias Otto,et al.  Chemometrics: Statistics and Computer Application in Analytical Chemistry , 1999 .

[33]  Kristin P. Bennett,et al.  Multicategory Classification by Support Vector Machines , 1999, Comput. Optim. Appl..

[34]  D B Kell,et al.  Variable selection in discriminant partial least-squares analysis. , 1998, Analytical chemistry.

[35]  Theodora Kourti,et al.  Process analysis, monitoring and diagnosis, using multivariate projection methods , 1995 .

[36]  John F. MacGregor,et al.  Multivariate SPC charts for monitoring batch processes , 1995 .

[37]  J. S. Urban Hjorth,et al.  Computer Intensive Statistical Methods: Validation, Model Selection, and Bootstrap , 1993 .

[38]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[39]  Jerome H. Friedman,et al.  Classification: Oldtimers and newcomers , 1989 .

[40]  S. Wold,et al.  Partial least squares analysis with cross‐validation for the two‐class problem: A Monte Carlo study , 1987 .

[41]  B. Manly Multivariate Statistical Methods : A Primer , 1986 .

[42]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[43]  D. Massart,et al.  UNEQ: a disjoint modelling technique for pattern recognition based on normal distribution , 1986 .

[44]  J. E. Jackson,et al.  Control Procedures for Residuals Associated With Principal Component Analysis , 1979 .

[45]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[46]  Svante Wold,et al.  Pattern recognition by means of disjoint principal components models , 1976, Pattern Recognit..