论文信息 - Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction

Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction

In data analysis, induction of decision trees serves two main goals: first, induced decision trees can be used for classification /prediction of new instances, and second, they represent an easy-to-interpret model of the problem domain that can be used for explanation. The accuracy of the induced classifier is usually estimated using N-fold cross validation, whereas for explanation purposes a decision tree induced from all the available data is used. Decision tree learning is relatively non-robust: a small change in the training set may significantly change the structure of the induced decision tree. This paper presents a decision tree construction method in which the domain model is constructed by consensus clustering of N decision trees induced in N-fold cross-validation. Experimental results show that consensus decision trees are simpler than C4.5 decision trees, indicating that they may be a more stable approximation of the intended domain model than decision tree, constructed from the entire set of training instances.

[1] I. Bratko,et al. Information-based evaluation criterion for classifier's performance , 2004, Machine Learning.

[2] John A. Hartigan,et al. Clustering Algorithms , 1975 .

[3] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[4] E. N. Adams. Consensus Techniques and the Comparison of Taxonomic Trees , 1972 .

[5] M. Stone. Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[6] William H. E. Day,et al. The role of complexity in comparing classifications , 1983 .

[7] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[8] Ping Zhang. On the Distributional Properties of Model Selection Criteria , 1992 .

[9] Douglas H. Fisher,et al. Noise-Tolerant Conceptual Clustering , 1989, IJCAI.

[10] Ron Kohavi,et al. Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[11] D. A. Neumann,et al. Consensus functions defined on trees , 1983 .

[12] Tobias Scheffer,et al. Unbiased assessment of learning algorithms , 1997, IJCAI 1997.

[13] Pat Langley,et al. Elements of Machine Learning , 1995 .

[14] S. Régnier,et al. Sur quelques aspects mathématiques des problèmes de classification automatique , 1983 .

[15] Ferenc Niedermayer. Cluster algorithms , 1997 .

[16] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[17] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[18] R. Sokal,et al. Principles of numerical taxonomy , 1965 .