A Theoretical Analysis of Bagging as a Linear Combination of Classifiers

We apply an analytical framework for the analysis of linearly combined classifiers to ensembles generated by bagging. This provides an analytical model of bagging misclassification probability as a function of the ensemble size, which is a novel result in the literature. Experimental results on real data sets confirm the theoretical predictions. This allows us to derive a novel and theoretically grounded guideline for choosing bagging ensemble size. Furthermore, our results are consistent with explanations of bagging in terms of classifier instability and variance reduction, support the optimality of the simple average over the weighted average combining rule for ensembles generated by bagging, and apply to other randomization-based methods for constructing classifier ensembles. Although our results do not allow to compare bagging misclassification probability with the one of an individual classifier trained on the original training set, we discuss how the considered theoretical framework could be exploited to this aim.

[1]  Kagan Tumer,et al.  Analysis of decision boundaries in linearly combined neural classifiers , 1996, Pattern Recognit..

[2]  Kagan Tumer,et al.  Linear and order statistics combiners for reliable pattern classification , 1996 .

[3]  Pedro M. Domingos Why Does Bagging Work? A Bayesian Account and its Implications , 1997, KDD.

[4]  Robert P. W. Duin,et al.  Bagging for linear classifiers , 1998, Pattern Recognit..

[5]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Kagan Tumer,et al.  Linear and Order Statistics Combiners for Pattern Classification , 1999, ArXiv.

[7]  A. Sharkey Linear and Order Statistics Combiners for Pattern Classification , 1999 .

[8]  Nitesh V. Chawla,et al.  Bagging Is a Small-Data-Set Phenomenon , 2001, CVPR.

[9]  Olivier Debeir,et al.  Limiting the Number of Trees in Random Forests , 2001, Multiple Classifier Systems.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[12]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[13]  Horst Bunke,et al.  Multiple Classifier Systems In Offline Handwritten Word Recognition - On The Influence Of Training Set And Vocabulary Size , 2004, Int. J. Pattern Recognit. Artif. Intell..

[14]  Yves Grandvalet,et al.  Bagging Equalizes Influence , 2004, Machine Learning.

[15]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[16]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[17]  Fabio Roli,et al.  A theoretical and experimental analysis of linear combiners for multiple classifier systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[19]  Fabio Roli,et al.  Dynamics of Variance Reduction in Bagging and Other Techniques Based on Randomisation , 2005, Multiple Classifier Systems.

[20]  Giorgio Valentini,et al.  An experimental bias-variance analysis of SVM ensembles based on resampling techniques , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  Gonzalo Martínez-Muñoz,et al.  Using boosting to prune bagging ensembles , 2007, Pattern Recognit. Lett..

[22]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.