Forest pruning based on Tree-Node Order

This paper proposes a forest pruning method called F-Pruning to improve the performance of ensembles based on decision trees. Instead of trimming each decision tree separately or/and selecting an optimal or sub-optimal subset of base classifiers to form an ensemble, F-Pruning takes a fixed number of trimmed or untrimmed decision trees as a forest (ensemble) and prunes branches directly from the forest to improve the ensemble accuracy. F-Pruning is a greedy algorithm, which uses the impurity measure and the number of examples in each node to determine the rank of the node, and prunes the node with lowest rank each time. In this way, F-Pruning achieves a fast forest pruning and reduces the size of final ensembles significantly. Our experiments show that, in comparison with ensembles built by combining trimmed or untrimmed decision trees, forests pruned by F-Pruning have better generalization capability in most of data sets. Additionally, our experiments show that executing F-Pruning on sub-forests selected by EPIC [1] can also reduce the size of the final ensembles significantly and improve their classification accuracies slightly.

[1]  Xindong Wu,et al.  Ensemble pruning via individual contribution ordering , 2010, KDD.

[2]  Ian Witten,et al.  Data Mining , 2000 .

[3]  Bernhard Pfahringer,et al.  Relational Random Forests Based on Random Relational Rules , 2009, IJCAI.

[4]  Huanhuan Chen,et al.  Predictive Ensemble Pruning by Expectation Propagation , 2009, IEEE Transactions on Knowledge and Data Engineering.

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[7]  Trevor Hastie,et al.  Additive Logistic Regression : a Statistical , 1998 .

[8]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[9]  Haijia Shi Best-first Decision Tree Learning , 2007 .

[10]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[13]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  Daniel Hernández-Lobato,et al.  An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Georgios C. Anagnostopoulos,et al.  A k-norm pruning algorithm for decision tree classifiers based on error rate estimation , 2008, Machine Learning.