Extracting Representative Tree Models From a Forest

A common criticism of many methods for constructing tree models is that a single tree or nested sequence of trees is produced, and that much uncertainty about the tree structure is ignored. Recent search algorithms (bumping, boosting, simulated annealing, MCMC) address this problem by nding a much richer collection of trees. They lead to an embarrassment of riches, in that it may be di cult to make sense of the resultant forest. Quite often, the problem may not be as bad as it seems: although hundreds of distinct trees are identi ed, many will di er only at a few nodes. Other trees may have di erent topologies, but produce similar partitions of the predictor space. By de ning several distance metrics on trees, we summarize a forest of trees by several representative trees and associated clusters. A new plot, the added tree plot is introduced as a means to decide how many trees to examine while simultaneously adjusting for the goodness-oft of the trees considered.