A Fast Way to Produce Optimal Fixed-Depth Decision Trees

Decision trees play an essential role in many classification tasks. In some circumstances, we only want to consider fixeddepth trees. Unfortunately, finding the optimal depth-d decision tree can require time exponential in d. This paper presents a fast way to produce a fixed-depth decision tree that is optimal under the Naı̈ve Bayes (NB) assumption. Here, we prove that the optimal depth-d feature essentially depends only on the posterior probability of the class label given the tests previously performed, but not directly on either the identity nor the outcomes of these tests. We can therefore precompute, in a fast pre-processing step, which features to use at the final layer. This results in a speedup of O(n/ log n), where n is the number of features. We apply this technique to learning fixed-depth decision trees from standard UCI repository datasets, and find this model improves the computational cost significantly. Surprisingly, this approach still yields relatively high classification accuracy, despite the NB assumption.

[1]  Y. Chien,et al.  Pattern classification and scene analysis , 1974 .

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Peter Auer,et al.  Theory and Applications of Agnostic PAC-Learning with Small Decision Trees , 1995, ICML.

[4]  TreesKristin P. Bennett,et al.  Optimal Decision Trees , 1996 .

[5]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[6]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[9]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[10]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[11]  Adnan Darwiche,et al.  Reasoning about Bayesian Network Classifiers , 2002, UAI.

[12]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[13]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[14]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[15]  Russell Greiner,et al.  Learning and Classifying Under Hard Budgets , 2005, ECML.

[16]  Siegfried Nijssen,et al.  Mining optimal decision trees from itemset lattices , 2007, KDD '07.