Ensemble diversity measures and their application to thinning

Abstract The diversity of an ensemble of classifiers can be calculated in a variety of ways. Here a diversity metric and a means for altering the diversity of an ensemble, called “thinning”, are introduced. We evaluate thinning algorithms created by several techniques on 22 publicly available datasets. When compared to other methods, our percentage correct diversity measure shows a greatest correlation between the increase in voted ensemble accuracy and the diversity value. Also, the analysis of different ensemble creation methods indicates that they generate different levels of diversity. Finally, the methods proposed for thinning show that ensembles can be made smaller without loss in accuracy.

[1]  Fabio Roli,et al.  An approach to the automatic design of multiple classifier systems , 2001, Pattern Recognit. Lett..

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Olivier Debeir,et al.  Limiting the Number of Trees in Random Forests , 2001, Multiple Classifier Systems.

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  G. Yule,et al.  On the association of attributes in statistics, with examples from the material of the childhood society, &c , 1900, Proceedings of the Royal Society of London.

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  Robert P. W. Duin,et al.  An experimental study on diversity for bagging and boosting with linear classifiers , 2002, Inf. Fusion.

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[10]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[11]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[12]  Lawrence O. Hall,et al.  A New Ensemble Diversity Measure Applied to Thinning Ensembles , 2003, Multiple Classifier Systems.

[13]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[14]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[16]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[17]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[18]  P.-C.-F. Daunou,et al.  Mémoire sur les élections au scrutin , 1803 .

[19]  Leo Breiman,et al.  Pasting Small Votes for Classification in Large Databases and On-Line , 1999, Machine Learning.

[20]  Noel E. Sharkey,et al.  The "Test and Select" Approach to Ensemble Combination , 2000, Multiple Classifier Systems.

[21]  Robert P. W. Duin,et al.  Is independence good for combining classifiers? , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.