Mutual Information Estimation for Filter Based Feature Selection Using Particle Swarm Optimization

Feature selection is a pre-processing step in classification, which selects a small set of important features to improve the classification performance and efficiency. Mutual information is very popular in feature selection because it is able to detect non-linear relationship between features. However the existing mutual information approaches only consider two-way interaction between features. In addition, in most methods, mutual information is calculated by a counting approach, which may lead to an inaccurate results. This paper proposes a filter feature selection algorithm based on particle swarm optimization (PSO) named PSOMIE, which employs a novel fitness function using nearest neighbor mutual information estimation (NNE) to measure the quality of a feature set. PSOMIE is compared with using all features and two traditional feature selection approaches. The experiment results show that the mutual information estimation successfully guides PSO to search for a small number of features while maintaining or improving the classification performance over using all features and the traditional feature selection methods. In addition, PSOMIE provides a strong consistency between training and test results, which may be used to avoid overfitting problem.

[1]  Thomas Marill,et al.  On the effectiveness of receptors in recognition systems , 1963, IEEE Trans. Inf. Theory.

[2]  Mengjie Zhang,et al.  Binary particle swarm optimisation for feature selection: A filter based approach , 2012, 2012 IEEE Congress on Evolutionary Computation.

[3]  Urvesh Bhowan,et al.  Genetic Programming for Feature Selection and Question-Answer Ranking in IBM Watson , 2015, EuroGP.

[4]  Deron Liang,et al.  Novel feature selection methods to financial distress prediction , 2014, Expert Syst. Appl..

[5]  Gauthier Doquire,et al.  A Performance Evaluation of Mutual Information Estimators for Multivariate Feature Selection , 2013 .

[6]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[7]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[8]  Igor Kononenko,et al.  On Biases in Estimating Multi-Valued Attributes , 1995, IJCAI.

[9]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Ivor W. Tsang,et al.  The Emerging "Big Dimensionality" , 2014, IEEE Computational Intelligence Magazine.

[12]  Li-Yeh Chuang,et al.  Improved binary PSO for feature selection using gene expression data , 2008, Comput. Biol. Chem..

[13]  Yan Li,et al.  Estimation of Mutual Information: A Survey , 2009, RSKT.

[14]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[15]  Huan Liu,et al.  Feature Selection: An Ever Evolving Frontier in Data Mining , 2010, FSDM.

[16]  R. Eberhart,et al.  Comparing inertia weights and constriction factors in particle swarm optimization , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[17]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[18]  Huan Liu,et al.  Consistency Based Feature Selection , 2000, PAKDD.

[19]  A. Wayne Whitney,et al.  A Direct Method of Nonparametric Measurement Selection , 1971, IEEE Transactions on Computers.

[20]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[21]  Frans van den Bergh,et al.  An analysis of particle swarm optimizers , 2002 .

[22]  Herbert A. Sturges,et al.  The Choice of a Class Interval , 1926 .

[23]  O. Weck,et al.  A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND THE GENETIC ALGORITHM , 2005 .

[24]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[25]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[26]  Mengjie Zhang,et al.  Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms , 2014, Appl. Soft Comput..

[27]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[28]  Mengjie Zhang,et al.  Particle swarm optimisation for feature selection: A hybrid filter-wrapper approach , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[29]  Mengjie Zhang,et al.  A Genetic Programming Approach to Hyper-Heuristic Feature Selection , 2012, SEAL.

[30]  Mengjie Zhang,et al.  Genetic Programming for Feature Subset Ranking in Binary Classification Problems , 2009, EuroGP.

[31]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[32]  Nikhil R. Pal,et al.  A Multiobjective Genetic Programming-Based Ensemble for Simultaneous Feature Selection and Classification , 2016, IEEE Transactions on Cybernetics.

[33]  Pedro Sousa,et al.  Email Spam Detection: a Symbiotic Feature Selection Approach Fostered by Evolutionary Computation , 2013, Int. J. Inf. Technol. Decis. Mak..