In this work, we present a new bottom-up algorithm for decision tree pruning that is very e cient (requiring only a single pass through the given tree), and prove a strong performance guarantee for the generalization error of the resulting pruned tree. We work in the typical setting in which the given tree T may have been derived from the given training sample S, and thus may badly over t S. In this setting, we give bounds on the amount of additional generalization error that our pruning su ers compared to the optimal pruning of T . More generally, our results show that if there is a pruning of T with small error, and whose size is small compared to jSj, then our algorithm will nd a pruning whose error is not much larger. This style of result has been called an index of resolvability result by Barron and Cover in the context of density estimation. A novel feature of our algorithm is its locality | the decision to prune a subtree is based entirely on properties of that subtree and the sample reaching it. To analyze our algorithm, we develop tools of local uniform convergence, a generalization of the standard notion that may prove useful in other settings.
[1]
Andrew R. Barron,et al.
Minimum complexity density estimation
,
1991,
IEEE Trans. Inf. Theory.
[2]
Alberto Maria Segre,et al.
Programs for Machine Learning
,
1994
.
[3]
Robert E. Schapire,et al.
Predicting Nearly As Well As the Best Pruning of a Decision Tree
,
1995,
COLT '95.
[4]
Yishay Mansour.
Pessimistic Decision Tree Pruning Based on Tree Size
,
1997,
ICML 1997.
[5]
Yishay Mansour,et al.
On the Boosting Ability of Top-Down Decision Tree Learning Algorithms
,
1999,
J. Comput. Syst. Sci..