Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy

Diversity among the members of a team of classifiers is deemed to be a key issue in classifier combination. However, measuring diversity is not straightforward because there is no generally accepted formal definition. We have found and studied ten statistics which can measure diversity among binary classifier outputs (correct or incorrect vote for the class label): four averaged pairwise measures (the Q statistic, the correlation, the disagreement and the double fault) and six non-pairwise measures (the entropy of the votes, the difficulty index, the Kohavi-Wolpert variance, the interrater agreement, the generalized diversity, and the coincident failure diversity). Four experiments have been designed to examine the relationship between the accuracy of the team and the measures of diversity, and among the measures themselves. Although there are proven connections between diversity and accuracy in some special cases, our results raise some doubts about the usefulness of diversity measures in building classifier ensembles in real-life pattern recognition problems.

[1]  G. Yule,et al.  On the association of attributes in statistics, with examples from the material of the childhood society, &c , 1900, Proceedings of the Royal Society of London.

[2]  B. J. Winer,et al.  Statistical Analysis--A Computer Oriented Approach. , 1972 .

[3]  Abdelmonem A. Afifi,et al.  Statistical Analysis: A Computer Oriented Approach. , 1973 .

[4]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[5]  Stephen W. Looney,et al.  A statistical technique for comparing the accuracies of several classifiers , 1988, Pattern Recognit. Lett..

[6]  Bev Littlewood,et al.  Conceptual Modeling of Coincident Failures in Multiversion Software , 1989, IEEE Trans. Software Eng..

[7]  Harris Drucker,et al.  Boosting and Other Ensemble Methods , 1994, Neural Computation.

[8]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[9]  Ching Y. Suen,et al.  A Method of Combining Multiple Experts for the Recognition of Unconstrained Handwritten Numerals , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  Kagan Tumer,et al.  Analysis of decision boundaries in linearly combined neural classifiers , 1996, Pattern Recognit..

[12]  Bruce E. Rosen,et al.  Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[13]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[14]  Paul W. Munro,et al.  Reducing Variance of Committee Prediction with Resampling Techniques , 1996, Connect. Sci..

[15]  David B. Skalak,et al.  The Sources of Increased Accuracy for Two Proposed Boosting Algorithms , 1996, AAAI 1996.

[16]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[17]  Noel E. Sharkey,et al.  Combining diverse neural nets , 1997, The Knowledge Engineering Review.

[18]  Derek Partridge,et al.  Software Diversity: Practical Statistics for Its Measurement and Exploitation | Draft Currently under Revision , 1996 .

[19]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  David W. Opitz,et al.  A Genetic Algorithm Approach for Creating Neural-Network Ensembles , 1999 .

[21]  Robert E. Schapire,et al.  Theoretical Views of Boosting , 1999, EuroCOLT.

[22]  Kagan Tumer,et al.  Linear and Order Statistics Combiners for Pattern Classification , 1999, ArXiv.

[23]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[24]  Amanda J. C. Sharkey,et al.  Treating Harmful Collinearity in Neural Network Ensembles , 1999 .

[25]  Robert P. W. Duin,et al.  PRTools - Version 3.0 - A Matlab Toolbox for Pattern Recognition , 2000 .

[26]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[27]  Ludmila I. Kuncheva,et al.  Fuzzy Classifier Design , 2000, Studies in Fuzziness and Soft Computing.

[28]  Padraig Cunningham,et al.  Diversity versus Quality in Classification Ensembles Based on Feature Selection , 2000, ECML.

[29]  Louisa Lam,et al.  Classifier Combinations: Implementations and Theoretical Issues , 2000, Multiple Classifier Systems.

[30]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[31]  Bogdan Gabrys,et al.  Application of the Evolutionary Algorithms for Classifier Selection in Multiple Classifier Systems with Majority Voting , 2001, Multiple Classifier Systems.

[32]  Fabio Roli,et al.  Design of effective neural network ensembles for image classification purposes , 2001, Image Vis. Comput..

[33]  Robert P. W. Duin,et al.  Limits on the majority vote accuracy in classifier fusion , 2003, Pattern Analysis & Applications.

[34]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[35]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[36]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.