A combinative method for decision tree construction

The selection of the splitting attribute in decision tree construction process is the key point for the size and quality of the tree. Although several criteria have been proposed and there are good papers that compare their results, no consensus have been adopted regarding the best method. In this paper we present a new approach in which each candidate attribute is evaluated using a set of available criteria and the attribute voted the best by most of the criteria will be selected as the winning splitting attribute. Each criterion will be evaluated based on the contingency table created at each splitting node. An approach based on OLAP is used for faster contingency table aggregation.

[1]  Ramón López de Mántaras,et al.  A distance-based attribute selection measure for decision tree induction , 1991, Machine Learning.

[2]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Wray L. Buntine,et al.  A further comparison of splitting rules for decision-tree induction , 2004, Machine Learning.

[5]  Tu Bao Ho,et al.  An Interactive-Graphic System for Decision Tree Induction , 1999 .

[6]  John Mingers,et al.  An Empirical Comparison of Selection Measures for Decision-Tree Induction , 1989, Machine Learning.

[7]  José G. Ramcrez Data Analysis: Statistical and Computational Methods for Scientists and Engineers , 2000, Technometrics.

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Viorel Negru,et al.  An Extensible Environment for Expert System Development , 2003, KES.

[10]  Sivakumar Harinath,et al.  Professional SQL Server Analysis Services 2005 with MDX , 2006 .

[11]  Matthew A. Carlton,et al.  Data Analysis: Statistical and Computational Methods for Scientists and Engineers , 2020 .

[12]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[13]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[14]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[15]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[16]  Tu Bao Ho,et al.  A visualization tool for interactive learning of large decision trees , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[17]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[18]  John Mingers,et al.  Expert Systems—Rule Induction with Statistical Data , 1987 .

[19]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[20]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[21]  Wei Zhong Liu,et al.  Bias in information-based measures in decision tree induction , 1994, Machine Learning.

[22]  Wei Zhong Liu,et al.  The Importance of Attribute Selection Measures in Decision Tree Induction , 1994, Machine Learning.

[23]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[24]  Mehmed Kantardzic What is Data Mining? , 2003 .