Making sense of a forest of treesH

A common criticism of many methods for constructing tree models is that a single tree or nested sequence of trees is produced, and that much uncertainty about the tree structure is ignored. Recent search algorithms (bumping, boosting, simulated annealing, MCMC) address this problem by nding a much richer collection of trees. They lead to an embarrassment of riches, in that it may be diicult to make sense of the resultant forest. Quite often, the problem may not be as bad as it seems: although hundreds of distinct trees are identiied, many will diier only at a few nodes. Other trees may have diierent topologies, but produce similar partitions of the predictor space. By deening several distance met-rics on trees, we summarize a forest of trees by several archetypes and associated clusters.

[1]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[3]  Bart Kuijpers,et al.  Simulated annealing in the construction of near-optimal decision trees , 1994 .

[4]  C. Geyer,et al.  Annealing Markov chain Monte Carlo with applications to ancestral inference , 1995 .

[5]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[8]  One Tree , 2007, Dialogue: A Journal of Mormon Thought.