Differential evolution for filter feature selection based on information theory and feature ranking

Abstract Feature selection is an essential step in various tasks, where filter feature selection algorithms are increasingly attractive due to their simplicity and fast speed. A common filter is to use mutual information to estimate the relationships between each feature and the class labels (mutual relevancy), and between each pair of features (mutual redundancy). This strategy has gained popularity resulting a variety of criteria based on mutual information. Other well-known strategies are to order each feature based on the nearest neighbor distance as in ReliefF, and based on the between-class variance and the within-class variance as in Fisher Score. However, each strategy comes with its own advantages and disadvantages. This paper proposes a new filter criterion inspired by the concepts of mutual information, ReliefF and Fisher Score. Instead of using mutual redundancy, the proposed criterion tries to choose the highest ranked features determined by ReliefF and Fisher Score while providing the mutual relevance between features and the class labels. Based on the proposed criterion, two new differential evolution (DE) based filter approaches are developed. While the former uses the proposed criterion as a single objective problem in a weighted manner, the latter considers the proposed criterion in a multi-objective design. Moreover, a well known mutual information feature selection approach (MIFS) based on maximum-relevance and minimum-redundancy is also adopted in single-objective and multi-objective DE algorithms for feature selection. The results show that the proposed criterion outperforms MIFS in both single objective and multi-objective DE frameworks. The results also indicate that considering feature selection as a multi-objective problem can generally provide better performance in terms of the feature subset size and the classification accuracy.

[1]  Xindong Wu,et al.  LOFS: Library of Online Streaming Feature Selection , 2016, Knowl. Based Syst..

[2]  Mengjie Zhang,et al.  A Comprehensive Comparison on Evolutionary Feature Selection Approaches to Classification , 2015, Int. J. Comput. Intell. Appl..

[3]  Hongwei Li,et al.  One Dependence Value Difference Metric , 2011, Knowl. Based Syst..

[4]  Carlos A. Coello Coello,et al.  Handling multiple objectives with particle swarm optimization , 2004, IEEE Transactions on Evolutionary Computation.

[5]  Mengjie Zhang,et al.  Binary PSO and Rough Set Theory for Feature Selection: a Multi-objective filter Based Approach , 2014, Int. J. Comput. Intell. Appl..

[6]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[7]  P. N. Suganthan,et al.  Differential Evolution: A Survey of the State-of-the-Art , 2011, IEEE Transactions on Evolutionary Computation.

[8]  Asif Ekbal,et al.  Joint model for feature selection and parameter optimization coupled with classifier ensemble in chemical mention recognition , 2015, Knowl. Based Syst..

[9]  Adel Al-Jumaily,et al.  A Combined Ant Colony and Differential Evolution Feature Selection Algorithm , 2008, ANTS Conference.

[10]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[11]  Xiaoming Xu,et al.  A hybrid genetic algorithm for feature selection wrapper based on mutual information , 2007, Pattern Recognit. Lett..

[12]  Antonio Martínez-Álvarez,et al.  Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps , 2014, Knowl. Based Syst..

[13]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[14]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[15]  Daoqiang Zhang,et al.  Iterative Laplacian Score for Feature Selection , 2012, CCPR.

[16]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[17]  Carlos A. Coello Coello,et al.  Improving PSO-Based Multi-objective Optimization Using Crowding, Mutation and epsilon-Dominance , 2005, EMO.

[18]  Bing Xue,et al.  Mutual information for feature selection: estimation or counting? , 2016, Evol. Intell..

[19]  Ahmed Al-Ani Ant Colony Optimization for Feature Subset Selection , 2005, WEC.

[20]  Mengjie Zhang,et al.  Pareto front feature selection based on artificial bee colony optimization , 2018, Inf. Sci..

[21]  Ben Niu,et al.  A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data , 2017, Knowl. Based Syst..

[22]  Gamini Dissanayake,et al.  Driver Drowsiness Classification Using Fuzzy Wavelet-Packet-Based Feature-Extraction Algorithm , 2011, IEEE Transactions on Biomedical Engineering.

[23]  Gerhard Nahler,et al.  Pearson Correlation Coefficient , 2020, Definitions.

[25]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[26]  William Bialek,et al.  Spikes: Exploring the Neural Code , 1996 .

[27]  Mengjie Zhang,et al.  A multi-objective artificial bee colony approach to feature selection using fuzzy mutual information , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[28]  Gavin Brown,et al.  A New Perspective for Information Theoretic Feature Selection , 2009, AISTATS.

[29]  R. Storn,et al.  Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series) , 2005 .

[30]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[31]  Rongcheng Liu,et al.  An Unsupervised Feature Selection Algorithm: Laplacian Score Combined with Distance-Based Entropy Measure , 2009, 2009 Third International Symposium on Intelligent Information Technology Application.

[32]  Witold Pedrycz,et al.  Granular multi-label feature selection based on mutual information , 2017, Pattern Recognit..

[33]  Chaoqun Li,et al.  A New Feature Selection Approach to Naive Bayes Text Classifiers , 2016, Int. J. Pattern Recognit. Artif. Intell..

[34]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[35]  Bin Ran,et al.  Feature selection with redundancy-complementariness dispersion , 2015, Knowl. Based Syst..

[36]  Max A. Little,et al.  A Simple Filter Benchmark for Feature Selection , 2010 .

[37]  Magdalene Marinaki,et al.  An Island Memetic Differential Evolution Algorithm for the Feature Selection Problem , 2013, NICSO.

[38]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Qingfu Zhang,et al.  Multiobjective differential evolution algorithm based on decomposition for a type of multiobjective bilevel programming problems , 2016, Knowl. Based Syst..

[40]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[41]  K DasAsit,et al.  Ensemble feature selection using bi-objective genetic algorithm , 2017 .

[42]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[43]  Mengjie Zhang,et al.  Binary particle swarm optimisation for feature selection: A filter based approach , 2012, 2012 IEEE Congress on Evolutionary Computation.

[44]  Mengjie Zhang,et al.  Multi-objective particle swarm optimisation (PSO) for feature selection , 2012, GECCO '12.

[45]  Dahua Lin,et al.  Conditional Infomax Learning: An Integrated Framework for Feature Extraction and Fusion , 2006, ECCV.

[46]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[47]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[48]  Mansour Sheikhan,et al.  Hybrid of binary gravitational search algorithm and mutual information for feature selection in intrusion detection systems , 2015, Soft Computing.

[49]  Mengjie Zhang,et al.  Filter based backward elimination in wrapper based PSO for feature selection in classification , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[50]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[51]  Mengjie Zhang,et al.  Multi-objective Evolutionary Algorithms for filter Based Feature Selection in Classification , 2013, Int. J. Artif. Intell. Tools.

[52]  Xiaodong Li,et al.  A Non-dominated Sorting Particle Swarm Optimizer for Multiobjective Optimization , 2003, GECCO.

[53]  Parham Moradi,et al.  Integration of graph clustering with ant colony optimization for feature selection , 2015, Knowl. Based Syst..

[54]  N. Ramaraj,et al.  A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm , 2010, Knowl. Based Syst..

[55]  Dana Kulic,et al.  An evaluation of classifier-specific filter measure performance for feature selection , 2015, Pattern Recognit..

[56]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[57]  Yafei Zhang,et al.  Dynamic Adaboost learning with feature selection based on parallel genetic algorithm for image annotation , 2010, Knowl. Based Syst..

[58]  Kun-Huang Chen,et al.  An improved artificial immune recognition system with the opposite sign test for feature selection , 2014, Knowl. Based Syst..

[59]  Liangxiao Jiang,et al.  A Novel Bayes Model: Hidden Naive Bayes , 2009, IEEE Transactions on Knowledge and Data Engineering.

[60]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[61]  Mohamed A. Deriche,et al.  A New Technique for Combining Multiple Classifiers using The Dempster-Shafer Theory of Evidence , 2002, J. Artif. Intell. Res..

[62]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[63]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[64]  Mengjie Zhang,et al.  A multi-objective particle swarm optimisation for filter-based feature selection in classification problems , 2012, Connect. Sci..

[65]  Sunanda Das,et al.  Ensemble feature selection using bi-objective genetic algorithm , 2017, Knowl. Based Syst..