Study of Various Decision Tree Pruning Methods with their Empirical Comparison in WEKA

is important problem in data mining. Given a data set, classifier generates meaningful description for each class. Decision trees are most effective and widely used classification methods. There are several algorithms for induction of decision trees. These trees are first induced and then prune subtrees with subsequent pruning phase to improve accuracy and prevent overfitting. In this paper, various pruning methods are discussed with their features and also effectiveness of pruning is evaluated. Accuracy is measured for diabetes and glass dataset with various pruning factors. The experiments are shown for this two datasets for measuring accuracy and size of the tree.

[1]  Carla E. Brodley,et al.  Pruning Decision Trees with Misclassification Costs , 1998, ECML.

[2]  Osmar R. Zaïane,et al.  Application of Data Mining Techniques for Medical Image Classification , 2001, MDM/KDD.

[3]  Ivan Bratko,et al.  On Estimating Probabilities in Tree Pruning , 1991, EWSL.

[4]  David C. Howell,et al.  Chi-Square Test: Analysis of Contingency Tables , 2011, International Encyclopedia of Statistical Science.

[5]  Dipti D. Patil,et al.  Evaluation of Decision Tree Pruning Algorithms for Complexity and Classification Accuracy , 2010 .

[6]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Minhaz Fahim Zibran,et al.  CHI-Squared Test of Independence , 2007 .

[9]  J. Ross Quinlan,et al.  Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[10]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  LiMin Fu,et al.  Rule Generation from Neural Networks , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[13]  Kay Chen Tan,et al.  A hybrid evolutionary algorithm for attribute selection in data mining , 2009, Expert Syst. Appl..

[14]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[15]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[16]  Max Bramer,et al.  Pre-pruning Classification Trees to Reduce Overfitting in Noisy Domains , 2002, IDEAL.