A Survey of Evolutionary Algorithms for Decision-Tree Induction

This paper presents a survey of evolutionary algorithms that are designed for decision-tree induction. In this context, most of the paper focuses on approaches that evolve decision trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of decision-tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and decision trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve decision trees and works that design decision-tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for decision-tree induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.

[1]  Ayahiko Niimi,et al.  Genetic programming combined with association rule algorithm for decision tree construction , 2000, KES'2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516).

[2]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[3]  Chandrika Kamath,et al.  Combining evolutionary algorithms with oblique decision trees to detect bent-double galaxies , 2000, SPIE Optics + Photonics.

[4]  Giandomenico Spezzano,et al.  Genetic Programming and Simulated Annealing: A Hybrid Method to Evolve Decision Trees , 2000, EuroGP.

[5]  Walter A. Kosters,et al.  Genetic programming for data classi cation: Re ning the search space , 2003 .

[6]  John R. Koza,et al.  Concept Formation and Decision Tree Induction Using the Genetic Programming Paradigm , 1990, PPSN.

[7]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Colin J Burgess,et al.  Can genetic programming improve software effort estimation? A comparative evaluation , 2001, Inf. Softw. Technol..

[9]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[11]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[12]  Qiangfu Zhao,et al.  Designing smaller decision trees using multiple objective optimization based GPs , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[13]  Mohammad Reza Kangavari,et al.  Using genetic programming for the induction of oblique decision trees , 2007, ICMLA 2007.

[14]  Nikolay I. Nikolaev,et al.  Fitness Landscapes and Inductive Genetic Programming , 1997, ICANNGA.

[15]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[16]  David J. Montana,et al.  Strongly Typed Genetic Programming , 1995, Evolutionary Computation.

[17]  John R. Koza,et al.  A genetic approach to the truck backer upper problem and the inter-twined spiral problem , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[18]  U. Fayyad On the induction of decision trees for multiple concept learning , 1991 .

[19]  S. Raghavan,et al.  Genetically Engineered Decision Trees: Population Diversity Produces Smarter Trees , 2003, Oper. Res..

[20]  Nikolay I. Nikolaev,et al.  Inductive Genetic Programming with Decision Trees , 1998, Intell. Data Anal..

[21]  Guangzhe Fan,et al.  Regression Tree Analysis Using TARGET , 2005 .

[22]  Ming Tan,et al.  Cost-sensitive learning of classification knowledge and its applications in robotics , 2004, Machine Learning.

[23]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[24]  Marek Kretowski,et al.  An Evolutionary Algorithm for Global Induction of Regression Trees , 2010, ICAISC.

[25]  Marlon Núñez The use of background knowledge in decision tree induction , 2004, Machine Learning.

[26]  Marc Parizeau,et al.  Genericity in Evolutionary Computation Software Tools: Principles and Case-study , 2006, Int. J. Artif. Intell. Tools.

[27]  Lars Niklasson,et al.  Evolving decision trees using oracle guides , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[28]  Giandomenico Spezzano,et al.  Parallel genetic programming for decision tree induction , 2001, Proceedings 13th IEEE International Conference on Tools with Artificial Intelligence. ICTAI 2001.

[29]  F. Mayer-Lindenberg,et al.  LEONARDO - The computational intelligence (CI) model selection wizard , 2007, ICMLA 2007.

[30]  Simon Kasif,et al.  OC1: A Randomized Induction of Oblique Decision Trees , 1993, AAAI.

[31]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[32]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[33]  Xin Yao,et al.  Cost-sensitive classification with genetic programming , 2005, 2005 IEEE Congress on Evolutionary Computation.

[34]  Alex Alves Freitas,et al.  Evolutionary model tree induction , 2010, SAC '10.

[35]  Daniel S. Hirschberg,et al.  The Time Complexity of Decision Tree Induction , 1995 .

[36]  Enrique Hernández-Lemus,et al.  GPDTI: A Genetic Programming Decision Tree Induction method to find epistatic effects in common complex diseases , 2007, ISMB/ECCB.

[37]  Qiangfu Zhao,et al.  A study on evolutionary design of binary decision trees , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[38]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[39]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[40]  Alex Alves Freitas,et al.  A critical review of multi-objective optimization in data mining: a position paper , 2004, SKDD.

[41]  Zbigniew Michalewicz,et al.  Parameter Control in Practice , 2007, Parameter Setting in Evolutionary Algorithms.

[42]  Steven W. Norton Generating Better Decision Trees , 1989, IJCAI.

[43]  Leon Bobrowski Piecewise-linear classifiers, formal neurons and separability of the learning sets , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[44]  William B. Langdon,et al.  Application of Genetic Programming to Induction of Linear Classification Trees , 2000, EuroGP.

[45]  Taghi M. Khoshgoftaar,et al.  Genetic programming-based decision trees for software quality classification , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[46]  Rodrigo C. Barros,et al.  Evolutionary model trees for handling continuous classes in machine learning , 2011, Inf. Sci..

[47]  Chandrika Kamath,et al.  Using Evolutionary Algorithms to Induce Oblique Decision Trees , 2000, GECCO.

[48]  Dimitrios Kalles,et al.  GA Tree: genetically evolved decision trees , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[49]  Xue Zhong Wang,et al.  Inductive data mining based on genetic programming: Automatic generation of decision trees from data for process historical data analysis , 2009, Comput. Chem. Eng..

[50]  John R. Woodward,et al.  GA or GP? That is not the question , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[51]  Alex Alves Freitas,et al.  LEGAL-tree: a lexicographic multi-objective genetic algorithm for decision tree induction , 2009, SAC '09.

[52]  Marek Kretowski,et al.  Evolutionary Induction of Cost-Sensitive Decision Trees , 2006, ISMIS.

[53]  Vili Podgorelec,et al.  Evolutionary induced decision trees for dangerous software modules prediction , 2002, Inf. Process. Lett..

[54]  Giandomenico Spezzano,et al.  A Cellular Genetic Programming Approach to Classification , 1999, GECCO.

[55]  Alex Alves Freitas A Review of evolutionary Algorithms for Data Mining , 2008, Soft Computing for Knowledge Discovery and Data Mining.

[56]  Vili Podgorelec,et al.  Self-adapting evolutionary decision support model , 1999, ISIE '99. Proceedings of the IEEE International Symposium on Industrial Electronics (Cat. No.99TH8465).

[57]  C. Tappert,et al.  A Genetic Algorithm for Constructing Compact Binary Decision Trees , 2009 .

[58]  Marek Kretowski,et al.  Mixed Decision Trees: An Evolutionary Approach , 2006, DaWaK.

[59]  R. Apweiler,et al.  On the Importance of Comprehensible Classification Models for Protein Function Prediction , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[60]  Douglas H. Norrie,et al.  Agent-Based Systems for Intelligent Manufacturing: A State-of-the-Art Survey , 1999, Knowledge and Information Systems.

[61]  Carlos A. Coello Coello,et al.  A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization Techniques , 1999, Knowledge and Information Systems.

[62]  Edward P. K. Tsang,et al.  Simplifying Decision Trees Learned by Genetic Programming , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[63]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[64]  Gerrit K. Janssens,et al.  Data mining with genetic algorithms on binary trees , 2003, Eur. J. Oper. Res..

[65]  Ming Tan,et al.  Cost-Sensitive Concept Learning of Sensor Use in Approach ad Recognition , 1989, ML.

[66]  Xavier Llorà,et al.  Evolution of Decision Trees , 2001 .

[67]  Tzung-Pei Hong,et al.  Applying genetic programming technique in classification trees , 2007, Soft Comput..

[68]  Ma Chong,et al.  Study on Constructing Generalized Decision Tree by Using DNA Coding Genetic Algorithm , 2009, 2009 International Conference on Web Information Systems and Mining.

[69]  Xinhua Zhuang,et al.  Binary linear decision tree with genetic algorithm , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[70]  Vili Podgorelec,et al.  Improving mining of medical data by outliers prediction , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[71]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[72]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[73]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .

[74]  Vili Podgorelec,et al.  Evolving groups of basic decision trees , 2001, Proceedings 14th IEEE Symposium on Computer-Based Medical Systems. CBMS 2001.

[75]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic and Evolutionary Computation) , 2006 .

[76]  Lior Rokach,et al.  Soft Computing for Knowledge Discovery and Data Mining , 2007 .

[77]  Walter A. Kosters,et al.  Detecting and Pruning Introns for Faster Decision Tree Evolution , 2004, PPSN.

[78]  Matt J. Aitkenhead,et al.  A co-evolving decision tree classification method , 2008, Expert Syst. Appl..

[79]  Tin Kam Ho,et al.  Measures of Geometrical Complexity in Classification Problems , 2006 .

[80]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[81]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[82]  GPShin'ichi Oka,et al.  Design of Decision Trees through Integration of C4.5 and GP , 2007 .

[83]  Taghi M. Khoshgoftaar,et al.  A Multi-Objective Software Quality Classification Model Using Genetic Programming , 2007, IEEE Transactions on Reliability.

[84]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[85]  Qiangfu Zhao,et al.  A Study on Efficient Generation of Decision Trees Using Genetic Programming , 2000, GECCO.

[86]  Vili Podgorelec,et al.  Using software metrics and evolutionary decision trees for software quality control , 2001 .

[87]  T.D. Pham,et al.  Analysis of cardiac imaging data using decision tree based parallel genetic programming , 2009, 2009 Proceedings of 6th International Symposium on Image and Signal Processing and Analysis.

[88]  S. Raghavan,et al.  A Genetic Algorithm-Based Approach for Building Accurate Decision Trees , 2003, INFORMS J. Comput..

[89]  Rafael Ramirez,et al.  Modelling expressive performance using consistent evolutionary regression trees , 2006 .

[90]  Athanassios Papagelis,et al.  Lossless fitness inheritance in genetic algorithms for decision trees , 2006, Soft Comput..

[91]  Marek Kretowski,et al.  Evolutionary Induction of Decision Trees for Misclassification Cost Minimization , 2007, ICANNGA.

[92]  Xavier Llorà,et al.  Mixed Decision Trees: Minimizing Knowledge Representation Bias in LCS , 2004, GECCO.

[93]  Xinhua Zhuang,et al.  Enhanced binary tree genetic algorithm for automatic land cover classification , 2000, IGARSS 2000. IEEE 2000 International Geoscience and Remote Sensing Symposium. Taking the Pulse of the Planet: The Role of Remote Sensing in Managing the Environment. Proceedings (Cat. No.00CH37120).

[94]  Z. Bandar,et al.  Genetic algorithm based multiple decision tree induction , 1999, ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378).

[95]  Scott F. Smith RNA Search Acceleration with Genetic Algorithm Generated Decision Trees , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[96]  Eric V. Siegel Competitively evolving decision trees against fixed training cases for natural language processing , 1994 .

[97]  Guangzhe Fan,et al.  Classification tree analysis using TARGET , 2008, Comput. Stat. Data Anal..

[98]  Ching Y. Suen,et al.  Binary Decision Tree Using K-means and Genetic Algorithm for Recognizing Defect Patterns of Cold Mill Strip , 2004, IEA/AIE.

[99]  Ming Tan,et al.  CSL: a cost-sensitive learning system for sensing and grasping objects , 1990, Proceedings., IEEE International Conference on Robotics and Automation.

[100]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[101]  Yong Wang,et al.  Using Model Trees for Classification , 1998, Machine Learning.

[102]  R. Reynolds,et al.  The use of cultural algorithms with evolutionary programming to guide decision tree induction in large databases , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[103]  Chandrika Kamath,et al.  Inducing oblique decision trees with evolutionary algorithms , 2003, IEEE Trans. Evol. Comput..

[104]  Dimitrios Kalles,et al.  Breeding Decision Trees Using Evolutionary Techniques , 2001, ICML.

[105]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[106]  P. Shanti Sastry,et al.  New algorithms for learning and pruning oblique decision trees , 1999, IEEE Trans. Syst. Man Cybern. Part C.

[107]  Kenneth A. De Jong,et al.  An Analysis of the Interacting Roles of Population Size and Crossover in Genetic Algorithms , 1990, PPSN.

[108]  Athanasios Tsakonas,et al.  Hierarchical classification trees using type-constrained genetic programming , 2002, Proceedings First International IEEE Symposium Intelligent Systems.

[109]  Jie Chen,et al.  Pruning Decision Tree Using Genetic Algorithms , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[110]  Vili Podgorelec,et al.  The Art of Building Decision Trees , 2000, Journal of Medical Systems.

[111]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[112]  Martijn C. J. Bot Improving Induction of Linear Classification Trees with Genetic Programming , 2000, GECCO.

[113]  Marek Kretowski,et al.  Global Induction of Oblique Decision Trees: An Evolutionary Approach , 2005, Intelligent Information Systems.

[114]  Alex Alves Freitas,et al.  Lexicographic multi-objective evolutionary induction of decision trees , 2009, Int. J. Bio Inspired Comput..

[115]  Giovanni Seni,et al.  Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions , 2010, Ensemble Methods in Data Mining.

[116]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[117]  E. Cantu-Paz,et al.  The Gambler's Ruin Problem, Genetic Algorithms, and the Sizing of Populations , 1997, Evolutionary Computation.

[118]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[119]  Vili Podgorelec,et al.  Finding the right decision tree's induction strategy for a hard real world problem , 2001, Int. J. Medical Informatics.

[120]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[121]  Xinhua Zhuang,et al.  Piecewise linear classifiers using binary tree structure and genetic algorithm , 1996, Pattern Recognit..

[122]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[123]  A. Engelbrecht,et al.  Searching the forest: using decision trees as building blocks for evolutionary search in classification databases , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[124]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[125]  Ching Y. Suen,et al.  Binary Decision Tree Using Genetic Algorithm for Recognizing Defect Patterns of Cold Mill Strip , 2004, Canadian Conference on AI.

[126]  Walter A. Kosters,et al.  Genetic Programming for data classification: partitioning the search space , 2004, SAC '04.

[127]  Ian Witten,et al.  Data Mining , 2000 .

[128]  Ramón López de Mántaras,et al.  A distance-based attribute selection measure for decision tree induction , 1991, Machine Learning.

[129]  Andries Petrus Engelbrecht,et al.  Genetic algorithms for the structural optimisation of learned polynomial expressions , 2007, Appl. Math. Comput..

[130]  Marek Kretowski,et al.  An Evolutionary Algorithm for Oblique Decision Tree Induction , 2004, ICAISC.

[131]  R. Potolea,et al.  A Hybrid Algorithm for Medical Diagnosis , 2007, EUROCON 2007 - The International Conference on "Computer as a Tool".

[132]  Monica Chis Evolutionary Decision Trees and Software Metrics for Module Defects Identification , 2008 .

[133]  P. Kokol,et al.  Evolutionary construction of medical decision trees , 1998, Proceedings of the 20th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Vol.20 Biomedical Engineering Towards the Year 2000 and Beyond (Cat. No.98CH36286).

[134]  P. V. G. Bradbeer,et al.  The Construction and Evaluation of Decision Trees: a Comparison of Evolutionary and Concept Learning Methods , 1997, Evolutionary Computing, AISB Workshop.

[135]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[136]  Vili Podgorelec,et al.  Towards More Optimal Medical Diagnosing with Evolutionary Algorithms , 2001, Journal of Medical Systems.

[137]  Christopher Gathercole,et al.  An investigation of supervised learning in genetic programming , 1998 .

[138]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[139]  Lothar Thiele,et al.  A Comparison of Selection Schemes Used in Evolutionary Algorithms , 1996, Evolutionary Computation.

[140]  Huimin Zhao,et al.  A multi-objective genetic programming approach to developing Pareto optimal decision trees , 2007, Decis. Support Syst..

[141]  Rafael Ramírez,et al.  Modelling Expressive Performance: A Regression Tree Approach Based on Strongly Typed Genetic Programming , 2006, EvoWorkshops.

[142]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[143]  Steven L. Dixon,et al.  Induction of Decision Trees via Evolutionary Programming , 2004, J. Chem. Inf. Model..

[144]  Andries Petrus Engelbrecht,et al.  Evolving model trees for mining data sets with continuous-valued classes , 2008, Expert Syst. Appl..

[145]  Giandomenico Spezzano,et al.  Improving induction decision trees with parallel genetic programming , 2002, Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing.

[146]  S. Raghavan,et al.  Diversification for better classification trees , 2006, Comput. Oper. Res..

[147]  J. R. Quinlan,et al.  MDL and Categorical Theories (Continued) , 1995, ICML.

[148]  Nikolay I. Nikolaev,et al.  Inductive Genetic Programming with Decision Trees , 1997, Intell. Data Anal..

[149]  Tzung-Pei Hong,et al.  AN IMPROVED KNOWLEDGE-ACQUISITION STRATEGY BASED ON GENETIC PROGRAMMING , 2008, Cybern. Syst..

[150]  Zhiwei Fu,et al.  A computational study of using genetic algorithms to develop intelligent decision trees , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[151]  Adrian F. M. Smith,et al.  A Bayesian CART algorithm , 1998 .

[152]  DaeEun Kim,et al.  Structural Risk Minimization on Decision Trees Using an Evolutionary Multiobjective Optimization , 2004, EuroGP.