A hybrid approach for optimal feature subset selection with evolutionary algorithms

Feature subset selection is very important as a preprocessing step for pattern recognition and data mining problems. The selected feature subset is expected to produce maximum possible classification accuracy with a minimum possible number of features. For optimal feature selection, a suitable evaluation function and an efficient search method are needed. There are two main approaches. In filter approach, the inherent characteristics of the data set is used for feature evaluation while in wrapper approach, the classification accuracy is used as the evaluation function. Both the approaches have relative merits and demerits. In this paper a suitable combination of both filter and wrapper approch is proposed for selection of optimal feature subset with evolutionary algorithm. Correlation based feature selection (CFS) and minimum redundancy and maximum relevance (mRMR) algorithms are used as filter evaluation approach, binary genetic algorithm (BGA) and binary particle swarm optimization (BPSO) are used as evolutionary serach algorithms. The simulation experiments are done with benchmark data sets. The simulation results show that proper hybridization approach is effective in achieving optimal feature subset selection with minimum number of features having high classification accuracy and low computational cost.

[1]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[2]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[3]  B. Chakraborty Feature subset selection by particle swarm optimization with fuzzy fitness function , 2008, 2008 3rd International Conference on Intelligent System and Knowledge Engineering.

[4]  Antanas Verikas,et al.  Feature selection with neural networks , 2002, Pattern Recognit. Lett..

[5]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[7]  Basabi Chakraborty Binary Particle Swarm Optimization Based Algorithm for Feature Subset Selection , 2009, 2009 Seventh International Conference on Advances in Pattern Recognition.

[8]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[9]  Bishwajit Chakraborty,et al.  Genetic algorithm with fuzzy fitness function for feature selection , 2002, Industrial Electronics, 2002. ISIE 2002. Proceedings of the 2002 IEEE International Symposium on.

[10]  Pier Luca Lanzi,et al.  Fast feature selection with genetic algorithms: a filter approach , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[11]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[12]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Basabi Chakraborty Genetic Algorithm with Fuzzy Operators for Feature Subset Selection , 2002, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[14]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).