Extensions to the CART Algorithm

Abstract The CART concept induction algorithm recursively partitions the measurement space, displaying the resulting partitions as decision trees. Care, however, must be taken not to overfit the trees to the data, and CART employs cross-validation (cv) as the means by which an appropriately sized tree is selected. Although unbiased, cv estimates exhibit high variance, a troublesome characteristic, particularly for small learning sets. This paper describes Monte Carlo experiments which illustrate the effectiveness of the ·632 bootstrap as an alternative technique for tree selection and error estimation. In addition, a new incremental learning extension to CART is described.