Moving towards efficient decision tree construction

Motivated by the desire to construct compact (in terms of expected length to be traversed to reach a decision) decision trees, we propose a new node splitting measure for decision tree construction. We show that the proposed measure is convex and cumulative and utilize this in the construction of decision trees for classification. Results obtained from several datasets from the UCI repository show that the proposed measure results in decision trees that are more compact with classification accuracy that is comparable to that obtained using popular node splitting measures such as Gain Ratio and the Gini Index.

[1]  C. Brodley,et al.  On the Qualitative Behavior of Impurity-Based Splitting Rules I: The Minima-Free Property , 1997 .

[2]  Kyuseok Shim,et al.  PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning , 1998, Data Mining and Knowledge Discovery.

[3]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[6]  Steven Salzberg,et al.  Lookahead and Pathology in Decision Tree Induction , 1995, IJCAI.

[7]  Hong-Yeop Song,et al.  A New Criterion in Selection and Discretization of Attributes for the Generation of Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[9]  B. Chandra,et al.  On Improving Efficiency of SLIQ Decision Tree Algorithm , 2007, 2007 International Joint Conference on Neural Networks.

[10]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[11]  Sanjay Ranka,et al.  CLOUDS: A Decision Tree Classifier for Large Datasets , 1998, KDD.

[12]  John Mingers,et al.  An empirical comparison of selection measures for decision-tree induction , 2004, Machine Learning.

[13]  Ravi Kothari,et al.  Look-ahead based fuzzy decision tree induction , 2001, IEEE Trans. Fuzzy Syst..

[14]  Ravi Kothari,et al.  DECISION TREES FOR CLASSIFICATION: A REVIEW AND SOME NEW RESULTS , 2001 .

[15]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[18]  L. Breiman Technical Note: Some Properties of Splitting Criteria , 1996, Machine Learning.

[19]  Leo Breiman,et al.  Technical note: Some properties of splitting criteria , 2004, Machine Learning.

[20]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[21]  Tapio Elomaa,et al.  General and Efficient Multisplitting of Numerical Attributes , 1999, Machine Learning.

[22]  Yasuhiko Morimoto,et al.  Algorithms for Finding Attribute Value Group for Binary Segmentation of Categorical Databases , 2002, IEEE Trans. Knowl. Data Eng..

[23]  Shi Zhongzhi,et al.  Studies on incidence pattern recognition based on information entropy , 2005, J. Inf. Sci..

[24]  Sati Mazumdar,et al.  Elegant decision tree algorithm for classification in data mining , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops), 2002..

[25]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[26]  Lei Lei,et al.  R-C4.5 decision tree model and its applications to health care dataset , 2005, Proceedings of ICSSSM '05. 2005 International Conference on Services Systems and Services Management, 2005..

[27]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[28]  Zhongzhi Shi,et al.  Downloaded from , 1997 .

[29]  Liangxiao Jiang,et al.  An Improved Attribute Selection Measure for Decision Tree Induction , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[30]  Tapio Elomaa,et al.  On the Well-Behavedness of Important Attribute Evaluation Functions , 1998, SCAI.

[31]  Carlo Zaniolo,et al.  CMP: a fast decision tree classifier using multivariate predictions , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[32]  John Mingers Inducing Rules for Expert Systems - Statistical Aspects , 1986 .

[33]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[34]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[35]  Ming Dong,et al.  Classifiability based omnivariate decision trees , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[36]  P. P. Chakrabarti,et al.  Improving Greedy Algorithms by Lookahead-Search , 1994, J. Algorithms.