Evolving decision trees with beam search-based initialization and lexicographic multi-objective evaluation

Decision tree induction algorithms represent one of the most popular techniques for dealing with classification problems. However, traditional decision-tree induction algorithms implement a greedy approach for node splitting that is inherently susceptible to local optima convergence. Evolutionary algorithms can avoid the problems associated with a greedy search and have been successfully employed to the induction of decision trees. Previously, we proposed a lexicographic multi-objective genetic algorithm for decision-tree induction, named LEGAL-Tree. In this work, we propose extending this approach substantially, particularly w.r.t. two important evolutionary aspects: the initialization of the population and the fitness function. We carry out a comprehensive set of experiments to validate our extended algorithm. The experimental results suggest that it is able to outperform both traditional algorithms for decision-tree induction and another evolutionary algorithm in a variety of application domains.

[1]  Marek Kretowski,et al.  Mixed Decision Trees: An Evolutionary Approach , 2006, DaWaK.

[2]  Edward P. K. Tsang,et al.  Simplifying Decision Trees Learned by Genetic Programming , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[3]  William B. Langdon,et al.  Application of Genetic Programming to Induction of Linear Classification Trees , 2000, EuroGP.

[4]  M. Arthur Munson,et al.  A study on the importance of and time spent on different modeling steps , 2012, SKDD.

[5]  A. Engelbrecht,et al.  Searching the forest: using decision trees as building blocks for evolutionary search in classification databases , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[6]  Rodrigo C. Barros,et al.  Evolutionary model trees for handling continuous classes in machine learning , 2011, Inf. Sci..

[7]  Gerrit K. Janssens,et al.  Data mining with genetic algorithms on binary trees , 2003, Eur. J. Oper. Res..

[8]  Alex Alves Freitas,et al.  Automatic Design of Decision-Tree Algorithms with Evolutionary Algorithms , 2013, Evolutionary Computation.

[9]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[10]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data , 2012, BMC Bioinformatics.

[11]  Walter A. Kosters,et al.  Genetic Programming for data classification: partitioning the search space , 2004, SAC '04.

[12]  Rodrigo C. Barros,et al.  Predicting software maintenance effort through evolutionary-based decision trees , 2012, SAC '12.

[13]  Shaul Markovitch,et al.  Anytime Learning of Decision Trees , 2007, J. Mach. Learn. Res..

[14]  Ian Witten,et al.  Data Mining , 2000 .

[15]  Dimitrios Kalles,et al.  GA Tree: genetically evolved decision trees , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[16]  Steven Salzberg,et al.  Lookahead and Pathology in Decision Tree Induction , 1995, IJCAI.

[17]  Alex Alves Freitas,et al.  LEGAL-tree: a lexicographic multi-objective genetic algorithm for decision tree induction , 2009, SAC '09.

[18]  Vili Podgorelec,et al.  Self-adapting evolutionary decision support model , 1999, ISIE '99. Proceedings of the IEEE International Symposium on Industrial Electronics (Cat. No.99TH8465).

[19]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[20]  L. Wang,et al.  EMPLOYING NOMINAL ATTRIBUTES IN CLASSIFICATION USING GENETIC PROGRAMMING , .

[21]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[22]  Huimin Zhao,et al.  A multi-objective genetic programming approach to developing Pareto optimal decision trees , 2007, Decis. Support Syst..

[23]  Zhiwei Fu,et al.  A computational study of using genetic algorithms to develop intelligent decision trees , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[24]  Vili Podgorelec,et al.  Evolutionary design of decision trees , 2013, WIREs Data Mining Knowl. Discov..

[25]  DaeEun Kim,et al.  Structural Risk Minimization on Decision Trees Using an Evolutionary Multiobjective Optimization , 2004, EuroGP.

[26]  Xavier Llorà,et al.  Evolution of Decision Trees , 2001 .

[27]  Ma Chong,et al.  Study on Constructing Generalized Decision Tree by Using DNA Coding Genetic Algorithm , 2009, 2009 International Conference on Web Information Systems and Mining.

[28]  Lars Niklasson,et al.  Evolving decision trees using oracle guides , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[29]  Giandomenico Spezzano,et al.  Genetic Programming and Simulated Annealing: A Hybrid Method to Evolve Decision Trees , 2000, EuroGP.

[30]  Walter A. Kosters,et al.  Genetic programming for data classi cation: Re ning the search space , 2003 .

[31]  John R. Koza,et al.  Concept Formation and Decision Tree Induction Using the Genetic Programming Paradigm , 1990, PPSN.

[32]  Concha Bielza,et al.  A review on evolutionary algorithms in Bayesian network learning and inference tasks , 2013, Inf. Sci..

[33]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Software effort prediction: a hyper-heuristic decision-tree based approach , 2013, SAC '13.

[34]  Steven W. Norton Generating Better Decision Trees , 1989, IJCAI.

[35]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[36]  Marek Kretowski,et al.  Evolutionary Induction of Cost-Sensitive Decision Trees , 2006, ISMIS.

[37]  Ravi Kothari,et al.  Look-ahead based fuzzy decision tree induction , 2001, IEEE Trans. Fuzzy Syst..

[38]  Giandomenico Spezzano,et al.  Improving induction decision trees with parallel genetic programming , 2002, Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing.

[39]  S. Raghavan,et al.  Diversification for better classification trees , 2006, Comput. Oper. Res..

[40]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[41]  Oscar Cordón,et al.  A multiobjective evolutionary programming framework for graph-based data mining , 2013, Inf. Sci..

[42]  Alex Alves Freitas,et al.  Evolutionary model tree induction , 2010, SAC '10.

[43]  Alex Alves Freitas,et al.  A critical review of multi-objective optimization in data mining: a position paper , 2004, SKDD.

[44]  Walter A. Kosters,et al.  Detecting and Pruning Introns for Faster Decision Tree Evolution , 2004, PPSN.

[45]  Marek Kretowski,et al.  Evolutionary Induction of Decision Trees for Misclassification Cost Minimization , 2007, ICANNGA.

[46]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[47]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Decision-Tree Induction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[48]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[50]  Alex Alves Freitas,et al.  A hyper-heuristic evolutionary algorithm for automatically designing decision-tree algorithms , 2012, GECCO '12.

[51]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[52]  Qiangfu Zhao,et al.  A Study on Efficient Generation of Decision Trees Using Genetic Programming , 2000, GECCO.

[53]  Kristin P. Bennett,et al.  Global Tree Optimization: A Non-greedy Decision Tree Algorithm , 2007 .

[54]  Vidroha Debroy,et al.  Genetic Programming , 1998, Lecture Notes in Computer Science.

[55]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[56]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic and Evolutionary Computation) , 2006 .

[57]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[58]  Alex Alves Freitas,et al.  Lexicographic multi-objective evolutionary induction of decision trees , 2009, Int. J. Bio Inspired Comput..

[59]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[60]  Giandomenico Spezzano,et al.  Parallel genetic programming for decision tree induction , 2001, Proceedings 13th IEEE International Conference on Tools with Artificial Intelligence. ICTAI 2001.

[61]  Qiangfu Zhao,et al.  A study on evolutionary design of binary decision trees , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[62]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[63]  Vic Ciesielski,et al.  Representing classification problems in genetic programming , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[64]  Dimitrios Kalles,et al.  Breeding Decision Trees Using Evolutionary Techniques , 2001, ICML.

[65]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[66]  Enrique Hernández-Lemus,et al.  GPDTI: A Genetic Programming Decision Tree Induction method to find epistatic effects in common complex diseases , 2007, ISMB/ECCB.

[67]  Alex Alves Freitas,et al.  Towards the automatic design of decision tree induction algorithms , 2011, GECCO.

[68]  Vili Podgorelec,et al.  Evolutionary design of decision trees for medical application , 2012, WIREs Data Mining Knowl. Discov..

[69]  Sebastián Ventura,et al.  An interpretable classification rule mining algorithm , 2013, Inf. Sci..