论文信息 - The comparison between classification trees through proximity measures

The comparison between classification trees through proximity measures

Several proximity measures have been proposed to compare classifications derived from different clustering algorithms. There are few proposed solutions for the comparison of two classification trees; some of them measure the difference between the structures of the trees, some other compare the partitions associated to the trees taking into account their predictive power. Their features and limitations are discussed. Furthermore, a new dissimilarity measure is proposed; it considers both the aspects explored separately by the previous ones. Three of these measures are then compared analyzing two different classification problems: a real data set and a simulation study. With respect to the real data set it is also evaluated how and how much each of the considered measures is influenced by the presence of highly predictive variables which are also highly correlated.

Gabriele Soffritti | Rossella Miglio | Gabriele Soffritti | R. Miglio

[1] R. Tibshirani,et al. Model Search by Bootstrap “Bumping” , 1999 .

[2] E. Ziegel,et al. Artificial intelligence and statistics , 1986 .

[3] Yoav Freund,et al. A Short Introduction to Boosting , 1999 .

[4] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[5] Robert Tibshirani,et al. Model Search and Inference By Bootstrap "bumping , 1995 .

[6] Adrian F. M. Smith,et al. A Bayesian CART algorithm , 1998 .

[7] A comment on McCallum , 1983 .

[8] David L. Wallace,et al. A Method for Comparing Two Hierarchical Clusterings: Comment , 1983 .

[9] W. Shannon,et al. Combining classification trees using MLE. , 1999, Statistics in medicine.

[10] H. Chipman,et al. Bayesian CART Model Search , 1998 .

[11] A. Ciampi,et al. Recursive Partition in Biostatistics: Stability of Trees and Choice of the Most Stable Classification , 1988 .

[12] Ian T. Jolliffe,et al. A Method for Comparing Two Hierarchical Clusterings: Comment , 1983 .

[13] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[14] Edward I. George,et al. Managing Multiple Models , 2001, AISTATS.

[15] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .

[16] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.