One class classifiers for process monitoring illustrated by the application to online HPLC of a continuous process

In process monitoring, a representative out‐of‐control class of samples cannot be generated. Here, it is assumed that it is possible to obtain a representative subset of samples from a single ‘in‐control class’ and one class classifiers namely Q and D statistics (respectively the residual distance to the disjoint PC model and the Mahalanobis distance to the centre of the QDA model in the projected PC space), as well as support vector domain description (SVDD) are applied to disjoint PC models of the normal operating conditions (NOC) region, to categorise whether the process is in‐control or out‐of‐control. To define the NOC region, the cumulative relative standard deviation (CRSD) and a test of multivariate normality are described and used as joint criteria. These calculations were based on the application of window principal components analysis (WPCA) which can be used to define a NOC region. The D and Q statistics and SVDD models were calculated for the NOC region and percentage predictive ability (%PA), percentage model stability (%MS) and percentage correctly classified (%CC) obtained to determine the quality of models from 100 training/test set splits. Q, D and SVDD control charts were obtained, and 90% confidence limits set up based on multivariate normality (D and Q) or SVDD D value (which does not require assumptions of normality). We introduce a method for finding an optimal radial basis function for the SVDD model and two new indices of percentage classification index (%CI) and percentage predictive index (%PI) for non‐NOC samples are also defined. The methods in this paper are exemplified by a continuous process studied over 105.11 h using online HPLC. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  R. Brereton,et al.  Disjoint hard models for classification , 2010 .

[2]  R. Brereton,et al.  Pattern recognition of inductively coupled plasma atomic emission spectroscopy of human scalp hair for discriminating between healthy and hepatitis C patients. , 2009, Analytica Chimica Acta.

[3]  Sila Kittiwachana,et al.  Multilevel simultaneous component analysis for fault detection in multicampaign process monitoring: application to on-line high performance liquid chromatography of a continuous process. , 2009, The Analyst.

[4]  Richard G Brereton,et al.  Variable selection using iterative reformulation of training set models for discrimination of samples: application to gas chromatography/mass spectrometry of mouse urinary metabolites. , 2009, Analytical chemistry.

[5]  P. Filzmoser,et al.  Repeated double cross validation , 2009 .

[6]  R. Brereton,et al.  Comparison of performance of five common classifiers represented as boundary methods: Euclidean Distance to Centroids, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization and Support Vector Machines, as dependent on data structure , 2009 .

[7]  R. Brereton,et al.  Dynamic analysis of on-line high-performance liquid chromatography for multivariate statistical process control. , 2008, Journal of chromatography. A.

[8]  J. Doornik,et al.  An Omnibus Test for Univariate and Multivariate Normality , 2008 .

[9]  Richard G. Brereton,et al.  Pattern Recognition of Gas Chromatography Mass Spectrometry of Human Volatiles in Sweat to distinguish the sex of subjects and determine potential Discriminatory Marker Peaks , 2007 .

[10]  Richard G. Brereton,et al.  Automated single-nucleotide polymorphism analysis using fluorescence excitation–emission spectroscopy and one-class classifiers , 2007, Analytical and bioanalytical chemistry.

[11]  Richard G Brereton,et al.  On-line HPLC combined with multivariate statistical process control for the monitoring of reactions. , 2007, Analytica chimica acta.

[12]  R. Brereton Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data , 2006 .

[13]  Yun Xu,et al.  Support Vector Machines: A Recent Method for Classification in Chemometrics , 2006 .

[14]  Yun Xu,et al.  Diagnostic Pattern Recognition on Gene-Expression Profile Data by Using One-Class Classification , 2005, J. Chem. Inf. Model..

[15]  Klaus-Robert Müller,et al.  A consistency-based model selection for one-class classification , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[16]  Richard G. Brereton,et al.  Determination of the Number of Significant Components in Liquid Chromatography Nuclear Magnetic Resonance Spectroscopy , 2004 .

[17]  Uwe Kruger,et al.  Synthesis of T2 and Q statistics for process monitoring , 2004 .

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[20]  M. P. Callao,et al.  Strategy for introducing NIR spectroscopy and multivariate calibration techniques in industry , 2003 .

[21]  S. Joe Qin,et al.  Statistical process monitoring: basics and beyond , 2003 .

[22]  R. Brereton,et al.  Evaluation of chemometric methods for determining the number and position of components in high-performance liquid chromatography detected by diode array detector and on-flow 1H nuclear magnetic resonance spectroscopy , 2003 .

[23]  Age K. Smilde,et al.  Monitoring of Batch Processes using Spectroscopy , 2002 .

[24]  Francis Eng Hock Tay,et al.  Modified support vector machines in financial time series forecasting , 2002, Neurocomputing.

[25]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[26]  Smilde,et al.  Spectroscopic monitoring of batch reactions for on-line fault detection and diagnosis , 2000, Analytical chemistry.

[27]  D. Massart,et al.  The Mahalanobis distance , 2000 .

[28]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[29]  R. Brereton,et al.  Chemometric methods for determination of selective regions in diode array detection high performance liquid chromatography of mixtures: application to chlorophyll a allomers , 1998 .

[30]  Tordis E. Morud,et al.  Multivariate statistical process control; example from the chemical process industry , 1996 .

[31]  Paul Nomikos,et al.  Detection and diagnosis of abnormal batch operations based on multi-way principal component analysis World Batch Forum, Toronto, May 1996 , 1996 .

[32]  Theodora Kourti,et al.  Process analysis, monitoring and diagnosis, using multivariate projection methods , 1995 .

[33]  John F. MacGregor,et al.  Multivariate SPC charts for monitoring batch processes , 1995 .

[34]  J. Macgregor,et al.  Monitoring batch processes using multiway principal component analysis , 1994 .

[35]  J. E. Jackson,et al.  Control Procedures for Residuals Associated With Principal Component Analysis , 1979 .

[36]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[37]  Svante Wold,et al.  Pattern recognition by means of disjoint principal components models , 1976, Pattern Recognit..

[38]  L. Shenton,et al.  Omnibus test contours for departures from normality based on √b1 and b2 , 1975 .

[39]  K. Mardia Measures of multivariate skewness and kurtosis with applications , 1970 .