Fault diagnosis of chemical processes with incomplete observations: A comparative study

An important problem to be addressed by diagnostic systems in industrial applications is the estimation of faults with incomplete observations. This work discusses different approaches for handling missing data, and performance of data-driven fault diagnosis schemes. An exploiting classifier and combined methods were assessed in Tennessee–Eastman process, for which diverse incomplete observations were produced. The use of several indicators revealed the trade-off between performances of the different schemes. Support vector machines (SVM) and C4.5, combined with k-nearest neighbourhood (kNN), produce the highest robustness and accuracy, respectively. Bayesian networks (BN) and centroid appear as inappropriate options in terms of accuracy, while Gaussian naive Bayes (GNB) is sensitive to imputation values. In addition, feature selection was explored for further performance enhancement, and the proposed contribution index showed promising results. Finally, an industrial case was studied to assess informative level of incomplete data in terms of the redundancy ratio and generalize the discussion.

[1]  Eamon Murphy,et al.  Tcp/Ip Tutorial and Technical Overview , 1995 .

[2]  Bogdan Gabrys,et al.  Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems , 2002, Int. J. Approx. Reason..

[3]  E. F. Vogel,et al.  A plant-wide industrial process control problem , 1993 .

[4]  J. Ibrahim Incomplete Data in Generalized Linear Models , 1990 .

[5]  Chuanyi Ji,et al.  Measurement-based network monitoring and inference: scalability and missing information , 2002, IEEE J. Sel. Areas Commun..

[6]  Hui Shao,et al.  Developing soft sensors using hybrid soft computing methodology: a neurofuzzy system based on rough set theory and genetic algorithms , 2006, Soft Comput..

[7]  Raghunathan Rengaswamy,et al.  A review of process fault detection and diagnosis: Part III: Process history based methods , 2003, Comput. Chem. Eng..

[8]  Chunhui Zhao,et al.  Statistical analysis based online sensor failure detection for continuous glucose monitoring in type I diabetes , 2015 .

[9]  Peter K. Sharpe,et al.  Dealing with missing values in neural network-based diagnostic systems , 1995, Neural Computing & Applications.

[10]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[11]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[12]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[13]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[15]  D. Dochain,et al.  On-Line Estimation and Adaptive Control of Bioreactors , 2013 .

[16]  D. Massart,et al.  Dealing with missing data , 2001 .

[17]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[18]  A Chéruy,et al.  Software sensors in bioprocess engineering , 1997 .

[19]  G.D. Gonzalez,et al.  Soft sensors for processing plants , 1999, Proceedings of the Second International Conference on Intelligent Processing and Manufacturing of Materials. IPMM'99 (Cat. No.99EX296).

[20]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[21]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[22]  Luigi Fortuna,et al.  Soft Sensors for Monitoring and Control of Industrial Processes (Advances in Industrial Control) , 2006 .

[23]  N. Lawrence Ricker,et al.  Decentralized control of the Tennessee Eastman Challenge Process , 1996 .

[24]  Bogdan Gabrys,et al.  Data-driven Soft Sensors in the process industry , 2009, Comput. Chem. Eng..

[25]  Ali Cinar,et al.  Monitoring, fault diagnosis, fault-tolerant control and optimization: Data driven methods , 2012, Comput. Chem. Eng..

[26]  Luis Puigjaner,et al.  Performance assessment of a novel fault diagnosis system based on support vector machines , 2009, Comput. Chem. Eng..

[27]  Zhiqiang Ge,et al.  Robust monitoring and fault reconstruction based on variational inference component analysis , 2011 .

[28]  Bor-Sen Chen,et al.  System parameter estimation with input/output noisy data and missing measurements , 2000, IEEE Trans. Signal Process..

[29]  Donghua Zhou,et al.  Active fault-tolerant control of nonlinear batch processes with sensor faults , 2007 .

[30]  Bogdan Gabrys,et al.  Review of adaptation mechanisms for data-driven soft sensors , 2011, Comput. Chem. Eng..

[31]  Judi Scheffer,et al.  Dealing with Missing Data , 2020, The Big R‐Book.

[32]  Soo-Young Lee,et al.  Training Algorithm with Incomplete Data for Feed-Forward Neural Networks , 1999, Neural Processing Letters.

[33]  Tshilidzi Marwala,et al.  The use of genetic algorithms and neural networks to approximate missing data in database , 2005, IEEE 3rd International Conference on Computational Cybernetics, 2005. ICCC 2005..

[34]  Ping Zhang,et al.  A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process , 2012 .

[35]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .

[36]  Shouhong Wang,et al.  Classification with incomplete survey data: a Hopfield neural network approach , 2005, Comput. Oper. Res..

[37]  V. Sugumaran,et al.  Exploiting sound signals for fault diagnosis of bearings using decision tree , 2013 .

[38]  Si-Zhao Joe Qin,et al.  Survey on data-driven industrial process monitoring and diagnosis , 2012, Annu. Rev. Control..

[39]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[40]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[41]  D. Massart,et al.  Dealing with missing data: Part II , 2001 .

[42]  Volker Tresp,et al.  Some Solutions to the Missing Feature Problem in Vision , 1992, NIPS.

[43]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[44]  Biao Huang Bayesian methods for control loop monitoring and diagnosis , 2008 .

[45]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[46]  Girijesh Prasad,et al.  Statistical and computational intelligence techniques for inferential model development: a comparative evaluation and a novel proposition for fusion , 2004, Eng. Appl. Artif. Intell..

[47]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[48]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[49]  Gustavo E. A. P. A. Batista,et al.  A Study of K-Nearest Neighbour as an Imputation Method , 2002, HIS.

[50]  P. A. Taylor,et al.  The impact of missing measurements on PCA and PLS prediction and monitoring applications , 2006 .

[51]  P. A. Taylor,et al.  Missing data methods in PCA and PLS: Score calculations with incomplete observations , 1996 .

[52]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[53]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[54]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[55]  German Rigau,et al.  Supervised Corpus-Based Methods for WSD , 2007 .

[56]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  David G. Stork,et al.  Pattern Classification , 1973 .

[58]  James Stephen Marron,et al.  Robust centroid based classification with minimum error rates for high dimension, low sample size data , 2009 .

[59]  Lionel Estel,et al.  Bayesian Network Method for Fault Diagnosis in a Continuous Tubular Reactor , 2010 .

[60]  Luis Puigjaner,et al.  Simultaneous fault diagnosis in chemical plants using a multilabel approach , 2007 .

[61]  Abdessamad Kobi,et al.  Multivariate control charts with a bayesian network , 2007, ICINCO-ICSO.

[62]  Jesús Cerquides,et al.  Comparison of Redundancy and Relevance Measures for Feature Selection in Tissue Classification of CT Images , 2010, ICDM.

[63]  Søren Feodor Nielsen,et al.  Inference and Missing Data: Asymptotic Results , 1997 .