Anytime Learning of Decision Trees

The majority of existing algorithms for learning decision trees are greedy---a tree is induced top-down, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Even the few non-greedy learners cannot learn good trees when the concept is difficult. Furthermore, they require a fixed amount of time and are not able to generate a better tree if additional time is available. We introduce a framework for anytime induction of decision trees that overcomes these problems by trading computation speed for better tree quality. Our proposed family of algorithms employs a novel strategy for evaluating candidate splits. A biased sampling of the space of consistent trees rooted at an attribute is used to estimate the size of the minimal tree under that attribute, and an attribute with the smallest expected tree is selected. We present two types of anytime induction algorithms: a contract algorithm that determines the sample size on the basis of a pre-given allocation of time, and an interruptible algorithm that starts with a greedy tree and continuously improves subtrees by additional sampling. Experimental results indicate that, for several hard concepts, our proposed approach exhibits good anytime behavior and yields significantly better decision trees when more time is available.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Usama M. Fayyad,et al.  What Should Be Minimized in a Decision Tree? , 1990, AAAI.

[3]  Michael J. Pazzani,et al.  Exploring the Decision Forest: An Empirical Investigation of Occam's Razor in Decision Tree Induction , 1993, J. Artif. Intell. Res..

[4]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[5]  Ravi Kothari,et al.  Look-ahead based fuzzy decision tree induction , 2001, IEEE Trans. Fuzzy Syst..

[6]  Raymond J. Mooney,et al.  Symbolic and neural learning algorithms: An experimental comparison , 1991, Machine Learning.

[7]  Tim Oates,et al.  The Effects of Training Set Size on Decision Tree Complexity , 1997, ICML.

[8]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[9]  Remco R. Bouckaert,et al.  Choosing Between Two Learning Algorithms Based on Calibrated Tests , 2003, ICML.

[10]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[11]  W. Spears,et al.  For Every Generalization Action, Is There Really an Equal and Opposite Reaction? , 1995, ICML.

[12]  R. Mike Cameron-Jones,et al.  Oversearching and Layered Search in Empirical Learning , 1995, IJCAI.

[13]  Wray L. Buntine,et al.  A further comparison of splitting rules for decision-tree induction , 2004, Machine Learning.

[14]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[15]  Steven Minton,et al.  Minimizing Conflicts: A Heuristic Repair Method for Constraint Satisfaction and Scheduling Problems , 1992, Artif. Intell..

[16]  Raymond J. Mooney,et al.  Symbolic and Neural Learning Algorithms: An Experimental Comparison , 1991, Machine Learning.

[17]  O. J. Murphy,et al.  Designing Storage Efficient Decision Trees , 1991, IEEE Trans. Computers.

[18]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[19]  Hyunjoong Kim,et al.  Classification Trees With Unbiased Multiway Splits , 2001 .

[20]  Dimitrios Kalles,et al.  Breeding Decision Trees Using Evolutionary Techniques , 2001, ICML.

[21]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[22]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[24]  Eric Joel Hovitz Computation and action under bounded resources , 1991 .

[25]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[26]  Larry A. Rendell,et al.  Lookahead Feature Construction for Learning Hard Concepts , 1993, International Conference on Machine Learning.

[27]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[28]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[29]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[30]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[31]  Stuart J. Russell,et al.  Principles of Metareasoning , 1989, Artif. Intell..

[32]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[33]  Temple F. Smith Occam's razor , 1980, Nature.

[34]  Mark Craven,et al.  Extracting comprehensible models from trained neural networks , 1996 .

[35]  Kristin P. Bennett,et al.  Global Tree Optimization: A Non-greedy Decision Tree Algorithm , 2007 .

[36]  Shaul Markovitch,et al.  Occam's Razor Just Got Sharper , 2007, IJCAI.

[37]  David W. Opitz,et al.  An anytime approach to connectionist theory refinement - refining the topologies of knowledge-based neural networks , 1996, Technical Report / University of Wisconsin, Madison / Computer Sciences Department.

[38]  Steven W. Norton Generating Better Decision Trees , 1989, IJCAI.

[39]  Andrea Schaerf,et al.  REPORT RAPPORT , 2022 .

[40]  David Page,et al.  Generalized skewing for functions with continuous and nominal attributes , 2005, ICML.

[41]  Ian Witten,et al.  Data Mining , 2000 .

[42]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[43]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[44]  Russell Greiner,et al.  Budgeted Learning of Naive-Bayes Classifiers , 2003, UAI.

[45]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[46]  Steven Salzberg,et al.  Lookahead and Pathology in Decision Tree Induction , 1995, IJCAI.

[47]  John Mingers,et al.  An Empirical Comparison of Selection Measures for Decision-Tree Induction , 1989, Machine Learning.

[48]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[49]  Mark S. Boddy,et al.  Deliberation Scheduling for Problem Solving in Time-Constrained Environments , 1994, Artif. Intell..

[50]  Abraham Kandel,et al.  Anytime Algorithm for Feature Selection , 2000, Rough Sets and Current Trends in Computing.

[51]  David Page,et al.  Sequential skewing: an improved skewing algorithm , 2004, ICML.

[52]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[53]  Geoffrey I. Webb Further Experimental Evidence against the Utility of Occam's Razor , 1996, J. Artif. Intell. Res..

[54]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[55]  Cecilia R. Aragon,et al.  Optimization by Simulated Annealing: An Experimental Evaluation; Part II, Graph Coloring and Number Partitioning , 1991, Oper. Res..

[56]  Shlomo Zilberstein,et al.  Optimal Composition of Real-Time Systems , 1996, Artif. Intell..

[57]  Russell Greiner,et al.  Budgeted learning of nailve-bayes classifiers , 2002, UAI 2002.

[58]  Michael Lindenbaum,et al.  Selective Sampling for Nearest Neighbor Classifiers , 1999, Machine Learning.

[59]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[60]  Shaul Markovitch,et al.  Feature Generation Using General Constructor Functions , 2002, Machine Learning.

[61]  Marko Robnik-Sikonja,et al.  Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF , 2004, Applied Intelligence.

[62]  P. P. Chakrabarti,et al.  Improving Greedy Algorithms by Lookahead-Search , 1994, J. Algorithms.

[63]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.