Simplifying decision trees: A survey

Induced decision trees are an extensively-researched solution to classification tasks. For many practical tasks, the trees produced by tree-generation algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpler, more comprehensible trees (or data structures derived from trees) with good classification accuracy, tree simplification has usually been of secondary concern relative to accuracy, and no attempt has been made to survey the literature from the perspective of simplification. We present a framework that organizes the approaches to tree simplification and summarize and critique the approaches within this framework. The purpose of this survey is to provide researchers and practitioners with a concise overview of tree-simplification approaches and insight into their relative capabilities. In our final discussion, we briefly describe some empirical findings and discuss the application of tree induction algorithms to case retrieval in case-based reasoning systems.

[1]  Jason Catlett,et al.  On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.

[2]  Larry A. Rendell,et al.  Using Multidimensional Projection to Find Relations , 1995, ICML.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Paul R. Cohen,et al.  Over tting Explained , 1997 .

[5]  Edward J. Delp,et al.  An Iterative Growing and Pruning Algorithm for Classification Tree Design , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Larry A. Rendell,et al.  Substantial Constructive Induction Using Layered Information Compression: Tractable Feature Formation in Search , 1985, IJCAI.

[7]  R. Mike Cameron-Jones,et al.  Oversearching and Layered Search in Empirical Learning , 1995, IJCAI.

[8]  Michael J. Pazzani,et al.  Reducing Misclassification Costs , 1994, ICML.

[9]  Larry A. Rendell,et al.  Improving the Design of Induction Methods by Analyzing Algorithm Functionality and Data-Based Concept Complexity , 1993, IJCAI.

[10]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[11]  Alberto L. Sangiovanni-Vincentelli,et al.  Learning Complex Boolean Functions: Algorithms and Applications , 1993, NIPS.

[12]  Ron Kohavi,et al.  Oblivious Decision Trees, Graphs, and Top-Down Pruning , 1995, IJCAI.

[13]  Richard S. Forsyth,et al.  Overfitting revisited: an information-theoretic approach to simplifying discrimination trees , 1994, J. Exp. Theor. Artif. Intell..

[14]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[15]  Johannes Fürnkranz,et al.  Incremental Reduced Error Pruning , 1994, ICML.

[16]  Russell Greiner,et al.  Exploring the Decision Forest: An Empirical Investigation of Occam's Razor in Decision Tree Induction , 1997 .

[17]  Larry A. Rendell,et al.  Constructive Induction On Decision Trees , 1989, IJCAI.

[18]  Justin Doak,et al.  An evaluation of feature selection methods and their application to computer security , 1992 .

[19]  Sholom M. Weiss,et al.  Small Sample Decision tree Pruning , 1994, ICML.

[20]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[21]  Alexander G. Gray,et al.  Retrofitting Decision Tree Classifiers Using Kernel Density Estimation , 1995, ICML.

[22]  Jude W. Shavlik,et al.  Growing Simpler Decision Trees to Facilitate Knowledge Discovery , 1996, KDD.

[23]  Steven W. Norton Generating Better Decision Trees , 1989, IJCAI.

[24]  Douglas H. Fisher,et al.  A Case Study of Incremental Concept Induction , 1986, AAAI.

[25]  Tapio Elomaa,et al.  In Defense of C4.5: Notes in Learning One-Level Decision Trees , 1994, ICML.

[26]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[27]  Jason Catlett,et al.  Experiments on the Costs and Benefits of Windowing in ID3 , 1988, ML.

[28]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[29]  Larry A. Rendell,et al.  Lookahead Feature Construction for Learning Hard Concepts , 1993, International Conference on Machine Learning.

[30]  Ivan Bratko,et al.  On Estimating Probabilities in Tree Pruning , 1991, EWSL.

[31]  Sholom M. Weiss,et al.  Decision Tree Pruning: Biased or Optimal? , 1994, AAAI.

[32]  Carla E. Brodley,et al.  Addressing the Selective Superiority Problem: Automatic Algorithm/Model Class Selection , 1993 .

[33]  David J. Lubinsky Increasing the Performance and Consistency of Classification Trees by Using the Accuracy Criterion at the Leaves , 1995, ICML.

[34]  Pat Langley,et al.  Oblivious Decision Trees and Abstract Cases , 1994 .

[35]  Jie Cheng,et al.  Improved Decision Trees: A Generalized Version of ID3 , 1988, ML.

[36]  J. Ross Quinlan,et al.  Combining Instance-Based and Model-Based Learning , 1993, ICML.

[37]  Michael J. Pazzani,et al.  Constructive Induction of M-of-N Terms , 1991, ML Workshop.

[38]  Brian R. Gaines An Ounce of Knowledge is Worth a Ton of Data: Quantitative studies of the Trade-Off between Expertise and Data Based On Statistically Well-Founded Empirical Induction , 1989, ML.

[39]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[40]  P. Utgoff,et al.  A Kolmogorov-Smirnoff Metric for Decision Tree Induction , 1996 .

[41]  I. Bratko,et al.  Learning decision rules in noisy domains , 1987 .

[42]  Gary J. Koehler,et al.  An investigation on the conditions of pruning an induced decision tree , 1994 .

[43]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[44]  Simon Kasif,et al.  Induction of Oblique Decision Trees , 1993, IJCAI.

[45]  Steven Salzberg,et al.  Lookahead and Pathology in Decision Tree Induction , 1995, IJCAI.

[46]  Carla E. Brodley,et al.  Automatic Selection of Split Criterion during Tree Growing Based on Node Location , 1995, ICML.

[47]  Christopher J. Matheus,et al.  Adding Domain Knowledge to SBL Through Feature Construction , 1990, AAAI.

[48]  Paul E. Utgoff,et al.  ID5: An Incremental ID3 , 1987, ML Workshop.

[49]  A. Gray,et al.  Retrootting Decision Tree Classiiers Using Kernel Density Estimation , 1995 .

[50]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[51]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[52]  Foster J. Provost,et al.  Small Disjuncts in Action: Learning to Diagnose Errors in the Local Loop of the Telephone Network , 1993, ICML.

[53]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[54]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[55]  SelectionCarla E. BrodleyDepartment Addressing the Selective Superiority Problem : Automatic Algorithm / Model Class , 1993 .

[56]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[57]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[58]  Masao Sakauchi,et al.  A Balanced Hierarchical Data Structure for Multidimensional Data with Highly Efficient Dynamic Characteristics , 1993, IEEE Trans. Knowl. Data Eng..

[59]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[60]  Ron Kohavi,et al.  Bottom-Up Induction of Oblivious Read-Once Decision Graphs: Strengths and Limitations , 1994, AAAI.

[61]  Andrew W. Moore,et al.  Efficient Algorithms for Minimizing Cross Validation Error , 1994, ICML.

[62]  Yishay Mansour,et al.  On the boosting ability of top-down decision tree learning algorithms , 1996, STOC '96.

[63]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[64]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[65]  Carla E. Brodley,et al.  Linear Machine Decision Trees , 1991 .

[66]  Cullen Schaffer,et al.  Deconstructing the Digit Recognition Problem , 1992, ML.

[67]  Donato Malerba,et al.  Simplifying Decision Trees by Pruning and Grafting: New Results (Extended Abstract) , 1995, ECML.

[68]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[69]  Geoffrey I. Webb Further Experimental Evidence against the Utility of Occam's Razor , 1996, J. Artif. Intell. Res..

[70]  Jadzia Cendrowska,et al.  PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[71]  Douglas H. Fisher,et al.  Concept Simplification and Prediction Accuracy , 1988, ML.

[72]  J. Robin B. Cockett,et al.  Decision tree reduction , 1990, JACM.

[73]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[74]  Brian R. Gaines Structured and Unstructured Induction with EDAGs , 1995, KDD.

[75]  Zijian Zheng,et al.  Constructing Nominal X-of-N Attributes , 1995, IJCAI.

[76]  Donato Malerba,et al.  Decision Tree Pruning as a Search in the State Space , 1993, ECML.

[77]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[78]  G. Kalkanis,et al.  The application of confidence interval error analysis to the design of decision tree classifiers , 1993, Pattern Recognit. Lett..

[79]  William W. Cohen Fast Eeective Rule Induction , 1995 .

[80]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[81]  Peter Auer,et al.  Theory and Applications of Agnostic PAC-Learning with Small Decision Trees , 1995, ICML.

[82]  Larry A. Rendell,et al.  The Replication Problem: A Constructive Induction Approach , 1991, EWSL.

[83]  Paul E. Utgoff,et al.  Perceptron Trees : A Case Study in ybrid Concept epresentations , 1999 .

[84]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[85]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[86]  Robert C. Holte,et al.  Concept Learning and the Problem of Small Disjuncts , 1989, IJCAI.

[87]  Miroslav Kubat,et al.  Pruning Multivariate Decision Trees by Hyperplane Merging , 1995, ECML.

[88]  E. M. Rounds A combined nonparametric approach to feature selection and binary decision tree design , 1980, Pattern Recognit..

[89]  W. J. H. Verkooijen,et al.  Which method learns most from the data , 1995 .

[90]  Thomas G. Dietterich,et al.  Applying the Waek Learning Framework to Understand and Improve C4.5 , 1996, ICML.

[91]  Edward J. Delp,et al.  An iterative growing and pruning algorithm for classification tree design , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.

[92]  Michael J. Shaw,et al.  Complex Concept Acquisition through Directed Search and Feature Caching , 1993, IJCAI.

[93]  Usama M. Fayyad,et al.  The Attribute Selection Problem in Decision Tree Generation , 1992, AAAI.

[94]  Larry A. Rendell,et al.  A Scheme for Feature Construction and a Comparison of Empirical Methods , 1991, IJCAI.

[95]  Michel Manago,et al.  INRECA: A Seamlessly Integrated System Based on Inductive Inference and Case-Based Reasoning , 1995, ICCBR.

[96]  Simon Kasif,et al.  OC1: A Randomized Induction of Oblique Decision Trees , 1993, AAAI.

[97]  Yishay Mansour,et al.  On the Boosting Ability of Top-Down Decision Tree Learning Algorithms , 1999, J. Comput. Syst. Sci..

[98]  Paul E. Utgoff,et al.  An Improved Algorithm for Incremental Induction of Decision Trees , 1994, ICML.

[99]  Stuart L. Crawford Extensions to the CART Algorithm , 1989, Int. J. Man Mach. Stud..

[100]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[101]  Shavlik,et al.  Growing Simpler Decision Trees to Facilitate Knowledge , 1999 .

[102]  Paul E. Utgoff,et al.  Improved Training Via Incremental Learning , 1989, ML.

[103]  Igor Kononenko,et al.  A counter example to the stronger version of the binarytree hypothesisIgor , 1995 .

[104]  John Mingers,et al.  Expert Systems—Rule Induction with Statistical Data , 1987 .

[105]  Justin Doak,et al.  CSE-92-18 - An Evaluation of Feature Selection Methodsand Their Application to Computer Security , 1992 .

[106]  Tharam S. Dillon,et al.  A Statistical-Heuristic Feature Selection Criterion for Decision Tree Induction , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[107]  Alberto L. Sangiovanni-Vincentelli,et al.  Inferring Reduced Ordered Decision Graphs of Minimum Description Length , 1995, ICML.

[108]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[109]  Michael I. Jordan A statistical approach to decision tree modeling , 1994, COLT '94.

[110]  Sholom M. Weiss,et al.  Reduced Complexity Rule Induction , 1991, IJCAI.

[111]  Carla E. Brodley,et al.  An Incremental Method for Finding Multivariate Splits for Decision Trees , 1990, ML.

[112]  Giulia Pagallo,et al.  Learning DNF by Decision Trees , 1989, IJCAI.

[113]  Xiaobo Li,et al.  Tree classifier design with a permutation statistic , 1986, Pattern Recognit..

[114]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[115]  M F Collen,et al.  Towards automated medical decisions. , 1972, Computers and biomedical research, an international journal.

[116]  Cullen Schaffer Sparse Data and the Effect of Overfitting Avoidance in Decision Tree Induction , 1992, AAAI.

[117]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[118]  I. K. Sethi,et al.  Hierarchical Classifier Design Using Mutual Information , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[119]  Lawrence B. Holder,et al.  Intermediate Decision Trees , 1995, IJCAI.

[120]  Jonathan J. Oliver Decision Graphs - An Extension of Decision Trees , 1993 .