A comparative study of Reduced Error Pruning method in decision tree algorithms

Decision tree is one of the most popular and efficient technique in data mining. This technique has been established and well-explored by many researchers. However, some decision tree algorithms may produce a large structure of tree size and it is difficult to understand. Furthermore, misclassification of data often occurs in learning process. Therefore, a decision tree algorithm that can produce a simple tree structure with high accuracy in term of classification rate is a need to work with huge volume of data. Pruning methods have been introduced to reduce the complexity of tree structure without decrease the accuracy of classification. One of pruning methods is the Reduced Error Pruning (REP). To better understand pruning methods, an experiment was conducted using Weka application to compare the performance in term of complexity of tree structure and accuracy of classification for J 48, REPTree, PART, JRip, and Ridor algorithms using seven standard datasets from UCI machine learning repository. In data modeling, J48 and REPTree generate tree structure as an output while PART, Ridor and JRip generate rules. In additional J48, REPTree and PART using REP method for pruning while Ridor and JRip using improvement of REP method, namely IREP and RIPPER methods. The experiment result shown performance of J48 and REPTree are competitive in producing better result. Between J48 and REPTree, average differences performance of accuracy of classification is 7.1006% and 6.2857% for complexity of tree structure. For classification rules algorithms, Ridor is the best algorithms compare to PART and JRip due to highest percentage of accuracy of classification in five dataset from seven datasets. An algorithm that produces high accuracy with simple tree structure or simple rules can be awarded as the best algorithm in decision tree.

[1]  Fernando Berzal Galiano,et al.  On the quest for easy-to-understand splitting rules , 2003, Data Knowl. Eng..

[2]  David W. Aha,et al.  Simplifying decision trees: A survey , 1997, The Knowledge Engineering Review.

[3]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[4]  Zhang Yong,et al.  Decision Tree's Pruning Algorithm Based on Deficient Data Sets , 2005, PDCAT.

[5]  Matthew N. Anyanwu,et al.  Comparative Analysis of Serial Decision Tree Classification Algorithms , 2009 .

[6]  Xindong Wu,et al.  The Top Ten Algorithms in Data Mining , 2009 .

[7]  Ali Mirza Mahmood,et al.  A NEW PRUNING APPROACH FOR BETTER AND COMPACT DECISION TREES , 2010 .

[8]  Jiawei Du Iterative Optimization of Rule Sets , 2010 .

[9]  Lior Rokach,et al.  Data Mining with Decision Trees - Theory and Applications , 2007, Series in Machine Perception and Artificial Intelligence.

[10]  Brian R. Gaines,et al.  Induction of ripple-down rules applied to modeling large databases , 1995, Journal of Intelligent Information Systems.

[11]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[12]  D. Lavanya,et al.  Performance Evaluation of Decision Tree Classifiers on Medical Datasets , 2011 .

[13]  Donato Malerba,et al.  The effects of pruning methods on the predictive accuracy of induced decision trees , 1999 .

[14]  B. Chandra,et al.  On Improving Efficiency of SLIQ Decision Tree Algorithm , 2007, 2007 International Joint Conference on Neural Networks.

[15]  Richard J. Roiger,et al.  Data Mining: A Tutorial Based Primer , 2002 .

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[17]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[18]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[19]  Yong Zhang,et al.  Decision Tree’s Pruning Algorithm Based on Deficient Data Sets , 2005, Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05).

[20]  Jie Chen,et al.  Pruning Decision Tree Using Genetic Algorithms , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[21]  Harry Zhang,et al.  A Fast Decision Tree Learning Algorithm , 2006, AAAI.

[22]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[23]  Sati Mazumdar,et al.  Elegant decision tree algorithm for classification in data mining , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops), 2002..

[24]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[25]  Y. Zhao,et al.  Comparison of decision tree methods for finding active objects , 2007, 0708.4274.

[26]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.