论文信息 - Extensions to the CART Algorithm

Extensions to the CART Algorithm

Abstract The CART concept induction algorithm recursively partitions the measurement space, displaying the resulting partitions as decision trees. Care, however, must be taken not to overfit the trees to the data, and CART employs cross-validation (cv) as the means by which an appropriately sized tree is selected. Although unbiased, cv estimates exhibit high variance, a troublesome characteristic, particularly for small learning sets. This paper describes Monte Carlo experiments which illustrate the effectiveness of the ·632 bootstrap as an alternative technique for tree selection and error estimation. In addition, a new incremental learning extension to CART is described.

Stuart L. Crawford

[1] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2] J. R. Quinlan. Discovering rules by induction from large collections of examples Intro-ductory readings in expert s , 1979 .

[3] B. Efron. Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[4] Walter D. Fisher. On Grouping for Maximum Homogeneity , 1958 .

[5] R. Olshen,et al. Asymptotically Efficient Solutions to the Classification Problem , 1978 .

[6] Ivan Bratko,et al. Experiments in automatic learning of medical diagnostic rules , 1984 .

[7] B. Efron. The jackknife, the bootstrap, and other resampling plans , 1987 .