论文信息 - Optimal Features Selection for Designing a Fault Diagnosis System

Optimal Features Selection for Designing a Fault Diagnosis System

Abstract Fault diagnosis (FD) using data-driven methods is essential for monitoring complex process systems, but its performance is severely affected by the quality of the used information. Additionally, processing huge amounts of data recorded by modern monitoring systems may be complex and time consuming if no data mining and/or pre-processing methods are employed. Thus, features selection for FD is advisable in order to determine the optimal subset of features/variables for conducting statistical analyses or building a machine-learning model. In this work, features selection are formulated as an optimization problem. Several relevancy indices, such as Maximum Relevance (MR), Value Difference Metric (VDM), and Fit Criterion (FC), and redundancy indices such as Minimum Redundancy (mR), Redundancy VDM (RVDM), and Redundancy Fit Criterion (RFC) are combined to determine the optimal subset of features. Another approach of features selection is based on the optimal performance of the classifier, which is achieved by a classifier wrapped with genetic algorithm. Efficiency of this strategy is explored considering different classifiers, namely Support Vector Machine (SVM), Decision Tree (DT), K-Nearest Neighbours (KNN) Classifier and Gaussian Naive Bayes (GNB). A Genetic algorithm (GA), as a Derivative Free Optimization (DFO) technique, has been used due to the robustness to deal with different kinds of problems. The optimal subset of obtained features has been tested with SVM, DT, KNN, and GNB for the Tennessee-Eastman process benchmark with 19 classes. Results show that, when the performance of the classifier is used as the objective function the wrapper method obtains the best features set.

[1] Jesús Cerquides,et al. Comparison of Redundancy and Relevance Measures for Feature Selection in Tissue Classification of CT Images , 2010, ICDM.

[2] Si-Zhao Joe Qin,et al. Survey on data-driven industrial process monitoring and diagnosis , 2012, Annu. Rev. Control..

[3] Pat Langley,et al. Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[4] E. F. Vogel,et al. A plant-wide industrial process control problem , 1993 .

[5] Ron Kohavi,et al. Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[6] S. Qin,et al. Improved nonlinear fault detection technique and statistical analysis , 2008 .

[7] Fuhui Long,et al. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Abdessamad Kobi,et al. Fault detection and identification with a new feature selection based on mutual information , 2008 .