Matching patterns from historical data using PCA and distance similarity factors

The diagnosis of abnormal plant operation can be greatly facilitated if periods of similar plant performance can be located in the historical database. A novel methodology is proposed for this pattern matching problem. The new approach provides a preliminary screening of large amounts of historical data in order to generate a candidate pool of similar periods of operation. This much smaller number of records can then be further evaluated by someone familiar with the process. Similarity factors are used to characterize the degree of similarity between the current abnormal operation and historical data. A new distance similarity factor is proposed that complements the standard PCA similarity factor. The two similarity factors provide the basis for an unsupervised pattern matching technique. The proposed pattern matching methodology has been evaluated in a detailed case study for a controlled CSTR (14 measured variables, more than 474,000 data points for each measured variable, and 19 operating modes/faults). The proposed methodology was able to locate over 90% of the previous occurrences of "abnormal situations".

[1]  C. Apte,et al.  Data mining: an industrial research perspective , 1997 .

[2]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[3]  Ali Cinar,et al.  Statistical Process Monitoring and Disturbance Isolation in Multivariate Continuous Processes , 1994 .

[4]  I. Jolliffe Principal Component Analysis , 2002 .

[5]  C. McGreavy,et al.  Automatic Classification for Mining Process Operational Data , 1998 .

[6]  Paul M. Frank,et al.  Fault diagnosis in dynamic systems using analytical and knowledge-based redundancy: A survey and some new results , 1990, Autom..

[7]  Heikki N. Koivo,et al.  Application of artificial neural networks in process fault diagnosis , 1991, Autom..

[8]  B. W. Bequette,et al.  Effect of process design on the open-loop behavior of a jacketed exothermic CSTR , 1996 .

[9]  Venkat Venkatasubramanian,et al.  Representing and diagnosing dynamic process data using neural networks , 1992 .

[10]  Theodora Kourti,et al.  Multivariate SPC Methods for Process and Product Monitoring , 1996 .

[11]  A. J. Morris,et al.  An overview of multivariate statistical process control in continuous and batch process performance monitoring , 1996 .

[12]  Michael K. Ng,et al.  Data-mining massive time series astronomical data: challenges, problems and solutions , 1999, Inf. Softw. Technol..

[13]  M FrankPaul Fault diagnosis in dynamic systems using analytical and knowledge-based redundancya survey and some new results , 1990 .

[14]  W. Krzanowski Between-Groups Comparison of Principal Components , 1979 .