A clustering based genetic algorithm for feature selection

Feature selection is a fundamental data preprocessing step in data mining, where its goal is removing some irrelevant and/or redundant features from a given dataset. In this paper, we present a clustering based genetic algorithm for feature selection (CGAFS). The proposed algorithm works in three steps. In the first step, Subset size is determined. In the second step, features are divided into clusters using k-means clustering algorithm. Finally, in the third step, features are selected using genetic algorithm with a new clustering based repair operation. The performance of the proposed method has been assessed on five benchmark classification problems. We also compared the performance of CGAFS with the results obtained from four existing well-known feature selection algorithms. The results show that the CGAFS produces consistently better classification accuracies.

[1]  Kazuyuki Murase,et al.  A new local search based hybrid genetic algorithm for feature selection , 2011, Neurocomputing.

[2]  Fabian Model,et al.  Feature selection for DNA methylation based cancer classification , 2001, ISMB.

[3]  Stjepan Oreski,et al.  Genetic algorithm-based heuristic for feature selection in credit risk assessment , 2014, Expert Syst. Appl..

[4]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[5]  Eduardo Gasca,et al.  Eliminating redundancy and irrelevance using a new MLP-based feature selection method , 2006, Pattern Recognit..

[6]  M. Carmen Garrido,et al.  Feature subset selection Filter-Wrapper based on low quality data , 2013, Expert Syst. Appl..

[7]  Yixin Chen,et al.  Efficient ant colony optimization for image feature selection , 2013, Signal Process..

[8]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  Justin C. W. Debuse,et al.  Feature Subset Selection within a Simulated Annealing Data Mining Algorithm , 1997, Journal of Intelligent Information Systems.

[11]  Thomas A. Runkler,et al.  Two cooperative ant colonies for feature selection using fuzzy models , 2010, Expert Syst. Appl..

[12]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Nasser Ghasem-Aghaee,et al.  Text feature selection using ant colony optimization , 2009, Expert Syst. Appl..

[14]  Kazuyuki Murase,et al.  A new wrapper feature selection approach using neural network , 2010, Neurocomputing.

[15]  Kemal Polat,et al.  Medical decision support system based on artificial immune recognition immune system (AIRS), fuzzy weighted pre-processing and feature selection , 2007, Expert Syst. Appl..

[16]  Yi Liu,et al.  FS_SFS: A novel feature selection method for support vector machines , 2006, Pattern Recognit..

[17]  Daoliang Li,et al.  An improved genetic algorithm for optimal feature subset selection from multi-character feature set , 2011, Expert Syst. Appl..

[18]  Kazuyuki Murase,et al.  A new hybrid ant colony optimization algorithm for feature selection , 2012, Expert Syst. Appl..

[19]  Nikhil R. Pal,et al.  Genetic programming for simultaneous feature selection and classifier design , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Ratna Babu Chinnam,et al.  mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification , 2011, Inf. Sci..

[21]  Yong Wang,et al.  Feature selection using tabu search with long-term memories and probabilistic neural networks , 2009, Pattern Recognit. Lett..

[22]  Constantine Kotropoulos,et al.  Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition , 2008, Signal Process..

[23]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[24]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .