论文信息 - Inducing Decision Trees based on a Cluster Quality Index

Inducing Decision Trees based on a Cluster Quality Index

Decision trees are popular classifiers in data mining, artificial intelligence, and pattern recognition, because they are accurate and easy to comprehend. In this paper, we introduce a new procedure for inducing decision trees, to obtain trees that are more accurate, more compact, and more balanced. Each candidate split is evaluated using Rand Statistics, a quality index based on external measures, because it is considered by many authors as the best existing index. Our method was compared with other state-of-the-art methods and the results over 30 databases from the UCI Repository prove our claims. We also introduce a new equation to measure the balance of a binary tree.

M. A. Medina | O. Loyola | M. Garcia

[1] Edward R. Dougherty,et al. Model-based evaluation of clustering validation measures , 2007, Pattern Recognit..

[2] Hong-Yeop Song,et al. A New Criterion in Selection and Discretization of Attributes for the Generation of Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[3] Liangxiao Jiang,et al. An Improved Attribute Selection Measure for Decision Tree Induction , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[4] Jen-Tzung Chien,et al. Compact decision trees with cluster validity for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] J. Ruiz-Shulcloper,et al. Pattern recognition with mixed and incomplete data , 2008, Pattern Recognition and Image Analysis.

[6] Tharam S. Dillon,et al. A Statistical-Heuristic Feature Selection Criterion for Decision Tree Induction , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[7] Matthew N. Anyanwu,et al. Comparative Analysis of Serial Decision Tree Classification Algorithms , 2009 .

[8] Sanjay Ranka,et al. CLOUDS: A Decision Tree Classifier for Large Datasets , 1998, KDD.

[9] Xindong Wu,et al. The Top Ten Algorithms in Data Mining , 2009 .

[10] Jorma Rissanen,et al. SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[11] Rakesh Agrawal,et al. SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[12] Gilbert Saporta,et al. Comparing partitions of two sets of units based on the same variables , 2010, Adv. Data Anal. Classif..

[13] Vili Podgorelec,et al. Decision Trees: An Overview and Their Use in Medicine , 2002, Journal of Medical Systems.

[14] Sotiris B. Kotsiantis,et al. Decision trees: a recent overview , 2011, Artificial Intelligence Review.

[15] Anil K. Jain. Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[16] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[17] Michalis Vazirgiannis,et al. On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[18] Kilian Stoffel,et al. Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[19] William M. Rand,et al. Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[20] Lior Rokach,et al. Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[21] Julio Gonzalo,et al. A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[22] Ravi Kothari,et al. A new node splitting measure for decision tree construction , 2010, Pattern Recognit..