On the boosting ability of top-down decision tree learning algorithms

We analyze the performance of top-down algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a proof that such algorithms are boosting algorithms. By this we mean that if the functions that label the internal nodes of the decision tree can weakly approximate the unknown target function, then the top-down algorithms we study will amplify this weak advantage to build a tree achieving any desired level of accuracy. The bounds we obtain for this amplification show an interesting dependence on the splitting criterion used by the top-down algorithm. More precisely, if the functions used to label the internal nodes have error 1=2 as approximations to the target function, then for the splitting criteria used by CART and C4.5, trees of size (1= )O(1= 2 2) and (1= )O(log(1= )= 2) (respectively) suffice to drive the error below . Thus (for example), small constant advantage over random guessing is amplified to constant error with trees of constant size. For a new splitting criterion suggested by our analysis, the much stronger bound of (1= )O(1= 2) (which is polynomial in 1= ) is obtained. The differing bounds have a natural explanation in terms of concavity properties of the splitting criterion. The primary contribution of this work is in proving that some popular and empirically successful heuristics that are based on first principles meet the criteria of an independently motivated theoretical model.

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[2]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[3]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[4]  Eyal Kushilevitz,et al.  Learning decision trees using the Fourier spectrum , 1991, STOC '91.

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Nader H. Bshouty,et al.  Exact learning via the Monotone theory , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[7]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[8]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[9]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[10]  Jeffrey C. Jackson,et al.  An efficient membership-query algorithm for learning DNF with respect to the uniform distribution , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[11]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[12]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[14]  Y. Freund Boosting a Weak Learning Algorithm by Majority to Be Published in Information and Computation , 1995 .

[15]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[16]  Thomas G. Dietterich,et al.  Applying the Waek Learning Framework to Understand and Improve C4.5 , 1996, ICML.