Shrinking classification trees for bootstrap aggregation

Abstract Bootstrap aggregating (bagging) classification trees has been shown to improve the accuracy over using a single classification tree. However, it has been noticed that bagging pruned trees does not necessarily result in better performance than bagging unpruned trees does. It is our goal here to discuss this issue in the context of shrinking, instead of pruning, trees. Due to many duplicated observations in a bootstrap sample, the usual shrinking determined by cross-validation (CV) in bagging is so conservative that the resulting shrunken tree is not much different from the unshrunken tree, leading to their close performance. We propose to choose the shrinkage parameter for each base tree in bagging by using only extra-bootstrap observations as test cases. For the digit data taken from Breiman et al. (1984), we find that our proposal leads to improved accuracy over that from bagging unshrunken trees.