A new local search based hybrid genetic algorithm for feature selection

This paper presents a new hybrid genetic algorithm (HGA) for feature selection (FS), called as HGAFS. The vital aspect of this algorithm is the selection of salient feature subset within a reduced size. HGAFS incorporates a new local search operation that is devised and embedded in HGA to fine-tune the search in FS process. The local search technique works on basis of the distinct and informative nature of input features that is computed by their correlation information. The aim is to guide the search process so that the newly generated offsprings can be adjusted by the less correlated (distinct) features consisting of general and special characteristics of a given dataset. Thus, the proposed HGAFS receives the reduced redundancy of information among the selected features. On the other hand, HGAFS emphasizes on selecting a subset of salient features with reduced number using a subset size determination scheme. We have tested our HGAFS on 11 real-world classification datasets having dimensions varying from 8 to 7129. The performances of HGAFS have been compared with the results of other existing ten well-known FS algorithms. It is found that, HGAFS produces consistently better performances on selecting the subsets of salient features with resulting better classification accuracies.

[1]  Marcel J. T. Reinders,et al.  Random subspace method for multivariate feature selection , 2006, Pattern Recognit. Lett..

[2]  Thomas Stützle,et al.  Ant Colony Optimization Theory , 2004 .

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[5]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[6]  Patrick Tan,et al.  Genetic algorithms applied to multi-class prediction for the analysis of gene expression data , 2003, Bioinform..

[7]  Zuren Feng,et al.  An efficient ant colony optimization approach to attribute reduction in rough set theory , 2008, Pattern Recognit. Lett..

[8]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[9]  Rolly Intan,et al.  Fuzzy Decision Tree Induction Approach for Mining Fuzzy Association Rules , 2009, ICONIP.

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[12]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[13]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[14]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Jin-Kao Hao,et al.  A hybrid LDA and genetic algorithm for gene selection and classification of microarray data , 2010, Neurocomputing.

[16]  Jun Liu,et al.  An Incremental Approach to Contribution-Based Feature Selection , 2004 .

[17]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[18]  Chun-Nan Hsu,et al.  The ANNIGMA-wrapper approach to fast feature selection for neural nets , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[19]  Huan Liu Feature Selection , 2010, Encyclopedia of Machine Learning.

[20]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[21]  Nikhil R. Pal,et al.  A neuro-fuzzy scheme for simultaneous feature selection and fuzzy rule-based classification , 2004, IEEE Transactions on Neural Networks.

[22]  Kazuyuki Murase,et al.  Involving New Local Search in Hybrid Genetic Algorithm for Feature Selection , 2009, ICONIP.

[23]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[24]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[25]  Kazuyuki Murase,et al.  A new wrapper feature selection approach using neural network , 2010, Neurocomputing.

[26]  Nikhil R. Pal,et al.  Genetic programming for simultaneous feature selection and classifier design , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[27]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[28]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[29]  M. Esmel ElAlami A filter model for feature subset selection based on genetic algorithm , 2009, Knowl. Based Syst..

[30]  Xuefeng Bruce Ling,et al.  Multiclass cancer classification and biomarker discovery using GA-based algorithms , 2005, Bioinform..

[31]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[32]  Hao Wu,et al.  An effective feature selection method for hyperspectral image classification based on genetic algorithm and support vector machine , 2011, Knowl. Based Syst..

[33]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[34]  Sreeram Ramakrishnan,et al.  A hybrid approach for feature subset selection using neural networks and ant colony optimization , 2007, Expert Syst. Appl..

[35]  Wotao Yin,et al.  A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..

[36]  Xiaoming Xu,et al.  A hybrid genetic algorithm for feature selection wrapper based on mutual information , 2007, Pattern Recognit. Lett..

[37]  Feng Chu,et al.  A General Wrapper Approach to Selection of Class-Dependent Features , 2008, IEEE Transactions on Neural Networks.

[38]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[39]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[40]  Thomas P. Trappenberg,et al.  Selecting inputs for modeling using normalized higher order statistics and independent component analysis , 2001, IEEE Trans. Neural Networks.

[41]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[42]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Nasser Ghasem-Aghaee,et al.  A novel ACO-GA hybrid algorithm for feature selection in protein function prediction , 2009, Expert Syst. Appl..

[45]  Shigeo Abe,et al.  Modified backward feature selection by cross validation , 2005, ESANN.

[46]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[47]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[48]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[49]  Eduardo Gasca,et al.  Eliminating redundancy and irrelevance using a new MLP-based feature selection method , 2006, Pattern Recognit..

[50]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[51]  Paul E. Utgoff,et al.  Randomized Variable Elimination , 2002, J. Mach. Learn. Res..

[52]  Kit Yan Chan,et al.  A new orthogonal array based crossover, with analysis of gene interactions, for evolutionary algorithms and its application to car door design , 2010, Expert Syst. Appl..

[53]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[54]  Antanas Verikas,et al.  Feature selection with neural networks , 2002, Pattern Recognit. Lett..

[55]  Zexuan Zhu,et al.  Markov blanket-embedded genetic algorithm for gene selection , 2007, Pattern Recognit..

[56]  Sung-Bae Cho,et al.  Prediction of colon cancer using an evolutionary neural network , 2004, Neurocomputing.

[57]  Yi-Leh Wu,et al.  Feature selection using genetic algorithm and cluster validation , 2011, Expert Syst. Appl..

[58]  James L. McClelland Parallel Distributed Processing , 2005 .

[59]  Josep M. Sopena,et al.  Performing Feature Selection With Multilayer Perceptrons , 2008, IEEE Transactions on Neural Networks.

[60]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Lutz Prechelt,et al.  A quantitative study of experimental evaluations of neural network learning algorithms: Current research practice , 1996, Neural Networks.

[62]  Nasser Ghasem-Aghaee,et al.  Text feature selection using ant colony optimization , 2009, Expert Syst. Appl..