HEAD-DT: Automatic Design of Decision-Tree Algorithms

As presented in Chap. 2, for the past 40 years researchers have attempted to improve decision-tree induction algorithms, either by proposing new splitting criteria for internal nodes, by investigating pruning strategies for avoiding overfitting, by testing new approaches for dealing with missing values, or even by searching for alternatives to the top-down greedy induction. Each new decision-tree induction algorithm presents some (or many) of these strategies, which are chosen in order to maximize performance in empirical analyses. Nevertheless, the number of different strategies for the several components of a decision-tree algorithm is so vast after these 40 years of research that it would be impracticable for a human being to test all possibilities with the purpose of achieving the best performance in a given data set (or in a set of data sets). Hence, we pose two questions for researchers in the area: “is it possible to automate the design of decision-tree induction algorithms?”, and, if so, “how can we automate the design of a decision-tree induction algorithm?” The answer for these questions arose with the pioneering work of Pappa and Freitas [30], which proposed the automatic design of rule induction algorithms through an evolutionary algorithm. The authors proposed the use of a grammar-based GP algorithm for building and evolving individuals which are, in fact, rule induction algorithms. That approach successfully employs EAs to evolve a generic rule induction algorithm, which can then be applied to solve many different classification problems, instead of evolving a specific set of rules tailored to a particular data set. As presented in Chap. 3, in the area of optimisation this type of approach is named hyper-heuristics (HHs) [5, 6]. HHs are search methods for automatically selecting and combining simpler heuristics, resulting in a generic heuristic that is used to solve any instance of a given optimisation problem. For instance, a HH can generate a generic heuristic for solving any instance of the timetabling problem (i.e., allocation of any number of resources subject to any set of constraints in any schedule configuration) whilst a conventional EA would just evolve a solution to one particular instance of the timetabling problem (i.e., a predefined set of resources and constraints in a given schedule configuration). In this chapter, we present a hyper-heuristic strategy for automatically designing decision-tree induction algorithms, namely HEAD-DT (Hyper-Heuristic Evolutionary Algorithm for Automatically Designing Decision-Tree Algorithms). Section 4.1 introduces HEAD-DT and its evolutionary scheme. Section 4.2 presents the individual representation adopted by HEAD-DT to evolve decision-tree algorithms, as well as information regarding each individual’s gene. Section 4.3 shows the evolutionary cycle of HEAD-DT, detailing its genetic operators. Section 4.4 depicts the fitness evaluation process in HEAD-DT, and introduces two possible frameworks for executing HEAD-DT. Section 4.5 computes the total size of the search space that HEAD-DT is capable of traversing, whereas Sect. 4.6 discusses related work.

[1]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.

[2]  David W. Corne,et al.  Hyper-heuristic decision tree induction , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[3]  João Gama,et al.  Ubiquitous Knowledge Discovery , 2011, IDA 2011.

[4]  Graham Kendall,et al.  A Tabu-Search Hyperheuristic for Timetabling and Rostering , 2003, J. Heuristics.

[5]  Rodrigo C. Barros,et al.  Evolutionary model trees for handling continuous classes in machine learning , 2011, Inf. Sci..

[6]  J. Kent Martin,et al.  An Exact Probability Metric for Decision Tree Splitting and Stopping , 1997, Machine Learning.

[7]  John Mingers,et al.  An Empirical Comparison of Selection Measures for Decision-Tree Induction , 1989, Machine Learning.

[8]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[10]  Usama M. Fayyad,et al.  The Attribute Selection Problem in Decision Tree Generation , 1992, AAAI.

[11]  Graham Kendall,et al.  A Classification of Hyper-heuristic Approaches , 2010 .

[12]  I. Bratko,et al.  Learning decision rules in noisy domains , 1987 .

[13]  Zoran Obradovic,et al.  Component-based decision trees for classification , 2011, Intell. Data Anal..

[14]  M F Collen,et al.  Towards automated medical decisions. , 1972, Computers and biomedical research, an international journal.

[15]  Andrew K. C. Wong,et al.  Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Alex Alves Freitas,et al.  Towards the automatic design of decision tree induction algorithms , 2011, GECCO.

[17]  Hong-Yeop Song,et al.  A New Criterion in Selection and Discretization of Attributes for the Generation of Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[19]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  B. Silverman,et al.  Block diagrams and splitting criteria for classification trees , 1993 .

[21]  B. Chandra,et al.  Moving towards efficient decision tree construction , 2009, Inf. Sci..

[22]  Sanja Petrovic,et al.  Recent research directions in automated timetabling , 2002, Eur. J. Oper. Res..

[23]  J. R. Quinlan DECISION TREES AS PROBABILISTIC CLASSIFIERS , 1987 .

[24]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[25]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[26]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[27]  Ivan Bratko,et al.  Experiments in automatic learning of medical diagnostic rules , 1984 .

[28]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[29]  Gisele L. Pappa,et al.  Automating the Design of Data Mining Algorithms: An Evolutionary Computation Approach , 2009 .

[30]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[31]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Decision-Tree Induction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[32]  John Mingers,et al.  Expert Systems—Rule Induction with Statistical Data , 1987 .

[33]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[34]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[35]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[36]  Ravi Kothari,et al.  A new node splitting measure for decision tree construction , 2010, Pattern Recognit..