Evolutionary computation for feature selection in classification problems

Feature subset selection (FSS) has received a great deal of attention in statistics, machine learning, and data mining. Real world data analyzed by data mining algorithms can involve a large number of redundant or irrelevant features or simply too many features for a learning algorithm to handle them efficiently. Feature selection is becoming essential as databases grow in size and complexity. The selection process is expected to bring benefits in terms of better performing models, computational efficiency, and simpler more understandable models. Evolutionary computation (EC) encompasses a number of naturally inspired techniques such as genetic algorithms, genetic programming, ant colony optimization, or particle swarm optimization algorithms. Such techniques are well suited to feature selection because the representation of a feature subset is straightforward and the evaluation can also be easily accomplished through the use of wrapper or filter algorithms. Furthermore, the capability of such heuristic algorithms to efficiently search large search spaces is of great advantage to the feature selection problem. Here, we review the use of different EC paradigms for feature selection in classification problems. We discuss details of each implementation including representation, evaluation, and validation. The review enables us to uncover the best EC algorithms for FSS and to point at future research directions. WIREs Data Mining Knowl Discov 2013, 3:381–407. doi: 10.1002/widm.1106 Conflict of interest: The authors have declared no conflicts of interest for this article. For further resources related to this article, please visit the WIREs website.

[1]  Pavel Pudil,et al.  Efficient Feature Subset Selection and Subset Size Optimization , 2010 .

[2]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[3]  Alex Alves Freitas A Review of evolutionary Algorithms for Data Mining , 2008, Soft Computing for Knowledge Discovery and Data Mining.

[4]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[5]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[6]  Martin Pelikan,et al.  An introduction and survey of estimation of distribution algorithms , 2011, Swarm Evol. Comput..

[7]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[8]  Zexuan Zhu,et al.  Memetic Algorithms for Feature Selection on Microarray Data , 2007, ISNN.

[9]  Lawrence J. Fogel,et al.  Artificial Intelligence through Simulated Evolution , 1966 .

[10]  Corso Elvezia,et al.  Ant colonies for the traveling salesman problem , 1997 .

[11]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[12]  Riyaz Sikora,et al.  Framework for efficient feature selection in genetic algorithm based data mining , 2007, Eur. J. Oper. Res..

[13]  Li Zhuo,et al.  A genetic algorithm based wrapper feature selection method for classification of hyperspectral images using support vector machine , 2008, Geoinformatics.

[14]  Alex A. Freitas,et al.  A survey of evolutionary algorithms for data mining and knowledge discovery , 2003 .

[15]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[16]  Jude W. Shavlik,et al.  Growing Simpler Decision Trees to Facilitate Knowledge Discovery , 1996, KDD.

[17]  Alper Ekrem Murat,et al.  A discrete particle swarm optimization method for feature selection in binary classification problems , 2010, Eur. J. Oper. Res..

[18]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[19]  Sepideh Samadzadegan Particle Swarm Optimization : A Survey , 2012 .

[20]  M Dorigo,et al.  Ant colonies for the travelling salesman problem. , 1997, Bio Systems.

[21]  Eibe Frank,et al.  Large-scale attribute selection using wrappers , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[22]  DistAl: An inter-pattern distance-based constructive learning algorithm , 1999, Intell. Data Anal..

[23]  Julie Wilson,et al.  Novel feature selection method for genetic programming using metabolomic 1H NMR data , 2006 .

[24]  Bharati M. Ramageri DATA MINING TECHNIQUES AND APPLICATIONS , 2011 .

[25]  Marco Dorigo,et al.  The ant colony optimization meta-heuristic , 1999 .

[26]  Javad Rahimipour Anaraki,et al.  Rough set based feature selection: A Review , 2013, The 5th Conference on Information and Knowledge Technology.

[27]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[28]  Jose Miguel Puerta,et al.  Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking , 2012, Knowl. Based Syst..

[29]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  G. Padmavathi,et al.  An Efficient Feature Selection Technique for User Authentication using Keystroke Dynamics , 2011 .

[31]  Thomas Stützle,et al.  MAX-MIN Ant System , 2000, Future Gener. Comput. Syst..

[32]  R. Lewontin ‘The Selfish Gene’ , 1977, Nature.

[33]  Colin R. Reeves,et al.  Evolutionary computation: a unified approach , 2007, Genetic Programming and Evolvable Machines.

[34]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[35]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[36]  Bernhard Schölkopf,et al.  Feature selection for support vector machines by means of genetic algorithm , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[37]  Jihoon Yang,et al.  Experimental Comparison of Feature Subset Selection Using GA and ACO Algorithm , 2006, ADMA.

[38]  Hisao Ishibuchi,et al.  Multi-objective pattern and feature selection by a genetic algorithm , 2000, GECCO.

[39]  Xiaoming Xu,et al.  A hybrid genetic algorithm for feature selection wrapper based on mutual information , 2007, Pattern Recognit. Lett..

[40]  Barnali Sahu,et al.  A Novel Feature Selection Algorithm using Particle Swarm Optimization for Cancer Microarray Data , 2012 .

[41]  Jesús S. Aguilar-Ruiz,et al.  Best Agglomerative Ranked Subset for Feature Selection , 2008, FSDM.

[42]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognition Letters.

[43]  Barbara Webb,et al.  Swarm Intelligence: From Natural to Artificial Systems , 2002, Connect. Sci..

[44]  Haleh Vafaie,et al.  Improving the Performance of a Rule Induction System Using Genetic Algorithms , 2001 .

[45]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[46]  Fernando E. B. Otero,et al.  Genetic Programming for Attribute Construction in Data Mining , 2002, EuroGP.

[47]  Marcus Randall,et al.  Feature Selection for Classification Using an Ant Colony System , 2010, 2010 Sixth IEEE International Conference on e-Science Workshops.

[48]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[49]  Temple F. Smith Occam's razor , 1980, Nature.

[50]  Fernando Pérez-Cruz,et al.  Feature Selection via Genetic Optimization , 2002, ICANN.

[51]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[52]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[53]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[54]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[55]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[56]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[57]  Luiz Eduardo Soares de Oliveira,et al.  Feature selection using multi-objective genetic algorithms for handwritten digit recognition , 2002, Object recognition supported by user interaction for service robots.

[58]  María José del Jesús,et al.  Evolutionary and metaheuristics based data mining , 2009, Soft Comput..

[59]  Edmund K. Burke,et al.  Improving the scalability of rule-based evolutionary learning , 2009, Memetic Comput..

[60]  Lawrence Davis,et al.  A Hybrid Genetic Algorithm for Classification , 1991, IJCAI.

[61]  Kenneth A. De Jong,et al.  Genetic algorithms as a tool for feature selection in machine learning , 1992, Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI '92.

[62]  Pat Langley,et al.  Average-Case Analysis of a Nearest Neighbor Algorithm , 1993, IJCAI.

[63]  Pedro Larrañaga,et al.  Feature Subset Selection by Bayesian network-based optimization , 2000, Artif. Intell..

[64]  Yvan Saeys,et al.  Feature selection for splice site prediction: A new method using EDA-based feature ranking , 2004, BMC Bioinformatics.

[65]  Yamuna Prasad,et al.  SVM Classifier Based Feature Selection Using GA, ACO and PSO for siRNA Design , 2010, ICSI.

[66]  Fakhri Karray,et al.  Multi-objective Feature Selection with NSGA II , 2007, ICANNGA.

[67]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[68]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[69]  Manuel López-Ibáñez,et al.  Ant colony optimization , 2010, GECCO '10.

[70]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[71]  Pablo Moscato,et al.  On Evolution, Search, Optimization, Genetic Algorithms and Martial Arts : Towards Memetic Algorithms , 1989 .

[72]  G. Grisetti,et al.  Further Reading , 1984, IEEE Spectrum.

[73]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[74]  Marco Laumanns,et al.  A Tutorial on Evolutionary Multiobjective Optimization , 2004, Metaheuristics for Multiobjective Optimisation.

[75]  Richard J. Enbody,et al.  Further Research on Feature Selection and Classification Using Genetic Algorithms , 1993, ICGA.

[76]  El-Ghazali Talbi,et al.  Comparison of population based metaheuristics for feature selection: Application to microarray data classification , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[77]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[78]  A. K. Jain,et al.  A critical evaluation of intrinsic dimensionality algorithms. , 1980 .

[79]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[80]  P. Pudil,et al.  of Techniques for Large-Scale Feature Selection , 1994 .

[81]  Zhen Ji,et al.  Towards a Memetic Feature Selection Paradigm [Application Notes] , 2010, IEEE Computational Intelligence Magazine.

[82]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[83]  Jafar Tanha,et al.  Combination of Ant Colony Optimization and Bayesian Classification for Feature Selection in a Bioinformatics Dataset , 2009, Journal of Computer Science & Systems Biology.

[84]  Pier Luca Lanzi,et al.  Fast feature selection with genetic algorithms: a filter approach , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[85]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[86]  ROSA BLANCO,et al.  Gene Selection For Cancer Classification Using Wrapper Approaches , 2004, Int. J. Pattern Recognit. Artif. Intell..

[87]  Victor J. Rayward-Smith,et al.  The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification , 2006, Eur. J. Oper. Res..

[88]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[89]  P. L. Lanzi,et al.  Improving Genetic Based Feature Selection by Reducing Data Dimensionality Extended , 1999 .

[90]  Qiang Shen,et al.  Finding Rough Set Reducts with Ant Colony Optimization , 2003 .

[91]  L. N. Kanal,et al.  Handbook of Statistics, Vol. 2. Classification, Pattern Recognition and Reduction of Dimensionality. , 1985 .

[92]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[93]  Cheng-Lung Huang,et al.  A distributed PSO-SVM hybrid system with feature selection and parameter optimization , 2008, Appl. Soft Comput..

[94]  Jerzy W. Bala,et al.  Hybrid Learning Using Genetic Algorithms and Decision Trees for Pattern Classification , 1995, IJCAI.

[95]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[96]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[97]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[98]  Carlos A. Coello Coello,et al.  A Short Tutorial on Evolutionary Multiobjective Optimization , 2001, EMO.

[99]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[100]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[101]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[102]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[103]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[104]  Duoqian Miao,et al.  A rough set approach to feature selection based on ant colony optimization , 2010, Pattern Recognit. Lett..

[105]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[106]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[107]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[108]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[109]  James Smith,et al.  A tutorial for competent memetic algorithms: model, taxonomy, and design issues , 2005, IEEE Transactions on Evolutionary Computation.

[110]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[111]  Holger Frohlich,et al.  Feature Selection for Support Vector Machines by Means of Genetic Algorithms -Diploma Thesis in Computer Science- , 2002 .

[112]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[113]  Ahmed Al-Ani,et al.  Feature Subset Selection Using Ant Colony Optimization , 2008 .

[114]  Jose Miguel Puerta,et al.  Global Feature Subset Selection on High-Dimensional Datasets Using Re-ranking-based EDAs , 2011, CAEPIA.

[115]  Erick Cantú-Paz,et al.  Feature Subset Selection by Estimation of Distribution Algorithms , 2002, GECCO.

[116]  Nasser Ghasem-Aghaee,et al.  A novel ACO-GA hybrid algorithm for feature selection in protein function prediction , 2009, Expert Syst. Appl..