Towards the automatic design of decision tree induction algorithms

Decision tree induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing decision trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes two different approaches for automatically generating generic decision tree induction algorithms. Both approaches are based on the evolutionary algorithms paradigm, which improves solutions based on metaphors of biological processes. We also propose guidelines to design interesting fitness functions for these evolutionary algorithms, which take into account the requirements and needs of the end-user.

[1]  Alex Alves Freitas,et al.  Lexicographic multi-objective evolutionary induction of decision trees , 2009, Int. J. Bio Inspired Comput..

[2]  Simon Kasif,et al.  Induction of Oblique Decision Trees , 1993, IJCAI.

[3]  Ivan Bratko,et al.  Experiments in automatic learning of medical diagnostic rules , 1984 .

[4]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[5]  Alex Alves Freitas,et al.  Automatically Evolving Rule Induction Algorithms , 2006, ECML.

[6]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[7]  G. H. Landeweerd,et al.  Binary tree versus single level tree classification of white blood cells , 1983, Pattern Recognit..

[8]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[9]  Liangxiao Jiang,et al.  An Improved Attribute Selection Measure for Decision Tree Induction , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[10]  David W. Aha,et al.  Simplifying decision trees: A survey , 1997, The Knowledge Engineering Review.

[11]  Ivan Bratko,et al.  On Estimating Probabilities in Tree Pruning , 1991, EWSL.

[12]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[13]  I. Bratko,et al.  Learning decision rules in noisy domains , 1987 .

[14]  Carla E. Brodley,et al.  An Incremental Method for Finding Multivariate Splits for Decision Trees , 1990, ML.

[15]  P. Shanti Sastry,et al.  New algorithms for learning and pruning oblique decision trees , 1999, IEEE Trans. Syst. Man Cybern. Part C.

[16]  J. R. Quinlan DECISION TREES AS PROBABILISTIC CLASSIFIERS , 1987 .

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  James N. Morgan,et al.  Searching for structure (alias-AID-III) : an approach to analysis of substantial bodies of micro-data and documentation for a computer program (successor to the Automatic Interaction Detector Program) , 1971 .

[19]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[20]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[21]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[22]  David A. Landgrebe,et al.  Hierarchical classifier design in high-dimensional numerous class cases , 1991, IEEE Trans. Geosci. Remote. Sens..

[23]  Michael Schlosser,et al.  Non-Linear Decision Trees - NDT , 1996, ICML.

[24]  Carla E. Brodley,et al.  Linear Machine Decision Trees , 1991 .

[25]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[26]  Hong-Yeop Song,et al.  A New Criterion in Selection and Discretization of Attributes for the Generation of Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[28]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[30]  Simon Kasif,et al.  OC1: A Randomized Induction of Oblique Decision Trees , 1993, AAAI.

[31]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[32]  David W. Corne,et al.  Hyper-heuristic decision tree induction , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[33]  Rodrigo C. Barros,et al.  Evolutionary model trees for handling continuous classes in machine learning , 2011, Inf. Sci..

[34]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.

[35]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[36]  M F Collen,et al.  Towards automated medical decisions. , 1972, Computers and biomedical research, an international journal.

[37]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[38]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  O. Mangasarian,et al.  Multicategory discrimination via linear programming , 1994 .

[40]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[41]  Robert Englebretson Searching for Structure: The problem of complementation in colloquial Indonesian conversation , 2003 .

[42]  E. M. Rounds A combined nonparametric approach to feature selection and binary decision tree design , 1980, Pattern Recognit..

[43]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.