Subtree Replacement in Decision Tree Simplification

The current availability of efficient algorithms for decision tree induction makes intricate post-processing techniques worth to be investigated both for efficiency and effectiveness. We study the simplification operator of subtree replacement, also known as grafting, originally implemented in the C4.5 system. We present a parametric bottom-up algorithm integrating grafting with the standard pruning operator, and analyze its complexity in terms of the number of nodes visited. Immediate instances of the parametric algorithm include extensions of error based, reduced error, minimum error, and pessimistic error pruning. Experimental results show that the computational cost of grafting is paid off by statistically significant smaller trees without accuracy loss.

[1]  Salvatore Ruggieri,et al.  YaDT: yet another decision tree builder , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[2]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[3]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[4]  Jorma Rissanen,et al.  MDL-Based Decision Tree Pruning , 1995, KDD.

[5]  Kweku-Muata Osei-Bryson,et al.  Post-pruning in decision tree induction using multiple performance measures , 2007, Comput. Oper. Res..

[6]  Geoffrey I. Webb Decision Tree Grafting From the All Tests But One Partition , 1999, IJCAI.

[7]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[8]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[9]  Donato Malerba,et al.  The effects of pruning methods on the predictive accuracy of induced decision trees , 1999 .

[10]  Vipin Kumar,et al.  Parallel Formulations of Decision-Tree Classification Algorithms , 2004, Data Mining and Knowledge Discovery.

[11]  Ivan Bratko,et al.  Trading Accuracy for Simplicity in Decision Trees , 1994, Machine Learning.

[12]  Christian Borgelt,et al.  A Decision Tree Plug-In for DataEngine , 2004 .

[13]  Foster J. Provost,et al.  Handling Missing Values when Applying Classification Models , 2007, J. Mach. Learn. Res..

[14]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Michelangelo Ceci,et al.  Simplification Methods for Model Trees with Regression and Splitting Nodes , 2003, MLDM.

[16]  Tao Wang,et al.  Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning , 2010, J. Syst. Softw..

[17]  Ji-Hyun Kim,et al.  Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap , 2009, Comput. Stat. Data Anal..

[18]  Georgios C. Anagnostopoulos,et al.  A k-norm pruning algorithm for decision tree classifiers based on error rate estimation , 2008, Machine Learning.

[19]  Massimo Torquati,et al.  Porting Decision Tree Algorithms to Multicore using FastFlow , 2010, ECML/PKDD.

[20]  David W. Aha,et al.  Simplifying decision trees: A survey , 1997, The Knowledge Engineering Review.

[21]  Wray L. Buntine,et al.  A further comparison of splitting rules for decision-tree induction , 2004, Machine Learning.

[22]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[23]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[24]  J. Ross Quinlan,et al.  Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[25]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[26]  Carla E. Brodley,et al.  Pruning Decision Trees with Misclassification Costs , 1998, ECML.

[27]  Derick Wood,et al.  Maximal Path Length of Binary Trees , 1994, Discret. Appl. Math..