Optimal Features Selection for Designing a Fault Diagnosis System

Abstract Fault diagnosis (FD) using data-driven methods is essential for monitoring complex process systems, but its performance is severely affected by the quality of the used information. Additionally, processing huge amounts of data recorded by modern monitoring systems may be complex and time consuming if no data mining and/or pre-processing methods are employed. Thus, features selection for FD is advisable in order to determine the optimal subset of features/variables for conducting statistical analyses or building a machine-learning model. In this work, features selection are formulated as an optimization problem. Several relevancy indices, such as Maximum Relevance (MR), Value Difference Metric (VDM), and Fit Criterion (FC), and redundancy indices such as Minimum Redundancy (mR), Redundancy VDM (RVDM), and Redundancy Fit Criterion (RFC) are combined to determine the optimal subset of features. Another approach of features selection is based on the optimal performance of the classifier, which is achieved by a classifier wrapped with genetic algorithm. Efficiency of this strategy is explored considering different classifiers, namely Support Vector Machine (SVM), Decision Tree (DT), K-Nearest Neighbours (KNN) Classifier and Gaussian Naive Bayes (GNB). A Genetic algorithm (GA), as a Derivative Free Optimization (DFO) technique, has been used due to the robustness to deal with different kinds of problems. The optimal subset of obtained features has been tested with SVM, DT, KNN, and GNB for the Tennessee-Eastman process benchmark with 19 classes. Results show that, when the performance of the classifier is used as the objective function the wrapper method obtains the best features set.