An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation

Several pruning strategies that can be used to reduce the size and increase the accuracy of bagging ensembles are analyzed. These heuristics select subsets of complementary classifiers that, when combined, can perform better than the whole ensemble. The pruning methods investigated are based on modifying the order of aggregation of classifiers in the ensemble. In the original bagging algorithm, the order of aggregation is left unspecified. When this order is random, the generalization error typically decreases as the number of classifiers in the ensemble increases. If an appropriate ordering for the aggregation process is devised, the generalization error reaches a minimum at intermediate numbers of classifiers. This minimum lies below the asymptotic error of bagging. Pruned ensembles are obtained by retaining a fraction of the classifiers in the ordered ensemble. The performance of these pruned ensembles is evaluated in several benchmark classification tasks under different training conditions. The results of this empirical investigation show that ordered aggregation can be used for the efficient generation of pruned ensembles that are competitive, in terms of performance and robustness of classification, with computationally more costly methods that directly select optimal or near-optimal subensembles.

[1]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[2]  Alberto Suárez,et al.  Aggregation Ordering in Bagging , 2004 .

[3]  William Nick Street,et al.  Ensemble Pruning Via Semi-definite Programming , 2006, J. Mach. Learn. Res..

[4]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[5]  Fabio Roli,et al.  An approach to the automatic design of multiple classifier systems , 2001, Pattern Recognit. Lett..

[6]  Grigorios Tsoumakas,et al.  Ensemble Pruning Using Reinforcement Learning , 2006, SETN.

[7]  Daniel Hernández-Lobato,et al.  Selection of Decision Stumps in Bagging Ensembles , 2007, ICANN.

[8]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[9]  Daniel Hernández-Lobato,et al.  Pruning Adaptive Boosting Ensembles by Means of a Genetic Algorithm , 2006, IDEAL.

[10]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[11]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Pedro M. Domingos Knowledge Acquisition from Examples Via Multiple Models , 1997 .

[13]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[14]  Jean-Philippe Thiran,et al.  Information Theoretic Combination of Classifiers with Application to AdaBoost , 2007, MCS.

[15]  L. Breiman Arcing the edge , 1997 .

[16]  Salvatore J. Stolfo,et al.  Cost Complexity-Based Pruning of Ensemble Classifiers , 2001, Knowledge and Information Systems.

[17]  Anne M. P. Canuto,et al.  Investigating the influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles , 2007, Pattern Recognit. Lett..

[18]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[19]  Philip S. Yu,et al.  Pruning and dynamic scheduling of cost-sensitive ensembles , 2002, AAAI/IAAI.

[20]  Gonzalo Martínez-Muñoz,et al.  Pruning in ordered bagging ensembles , 2006, ICML.

[21]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[22]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[23]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[24]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[25]  Grigorios Tsoumakas,et al.  Effective Voting of Heterogeneous Classifiers , 2004, ECML.

[26]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[28]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[29]  Fabio Roli,et al.  Dynamic classifier selection based on multiple classifier behaviour , 2001, Pattern Recognit..

[30]  Tom Heskes,et al.  Clustering ensembles of neural network models , 2003, Neural Networks.

[31]  Gonzalo Martínez-Muñoz,et al.  Using boosting to prune bagging ensembles , 2007, Pattern Recognit. Lett..

[32]  Christino Tamon,et al.  On the Boosting Pruning Problem , 2000, ECML.

[33]  Mykola Pechenizkiy,et al.  Diversity in search strategies for ensemble feature selection , 2005, Inf. Fusion.

[34]  Cigdem Demir,et al.  Cost-conscious classifier ensembles , 2005, Pattern Recognit. Lett..

[35]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[36]  Wei Tang,et al.  Selective Ensemble of Decision Trees , 2003, RSFDGrC.

[37]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[38]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[39]  Alexey Tsymbal,et al.  Bagging and Boosting with Dynamic Integration of Classifiers , 2000, PKDD.

[40]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[41]  Berkman Sahiner,et al.  Dual system approach to computer-aided detection of breast masses on mammograms. , 2006, Medical physics.

[42]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[43]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[44]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[45]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Lawrence O. Hall,et al.  Ensemble diversity measures and their application to thinning , 2004, Inf. Fusion.

[47]  Lefteris Angelis,et al.  Selective fusion of heterogeneous classifiers , 2005, Intell. Data Anal..

[48]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[49]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[50]  J. G. Carbonell,et al.  Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing , 2003, Lecture Notes in Computer Science.