A Bound on Kappa-Error Diagrams for Analysis of Classifier Ensembles

Kappa-error diagrams are used to gain insights about why an ensemble method is better than another on a given data set. A point on the diagram corresponds to a pair of classifiers. The x-axis is the pairwise diversity (kappa), and the y-axis is the averaged individual error. In this study, kappa is calculated from the 2 × 2 correct/wrong contingency matrix. We derive a lower bound on kappa which determines the feasible part of the kappa-error diagram. Simulations and experiments with real data show that there is unoccupied feasible space on the diagram corresponding to (hypothetical) better ensembles, and that individual accuracy is the leading factor in improving the ensemble accuracy.

[1]  Lior Rokach,et al.  Collective-agreement-based pruning of ensembles , 2009, Comput. Stat. Data Anal..

[2]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[3]  Giorgio Valentini,et al.  Ensemble methods : a review , 2012 .

[4]  Jean-Philippe Thiran,et al.  Information theoretic combination of pattern classifiers , 2010, Pattern Recognit..

[5]  Ashok N. Srivastava,et al.  Advances in Machine Learning and Data Mining for Astronomy , 2012 .

[6]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[9]  Hussein A. Abbass,et al.  The use of coevolution and the artificial immune system for ensemble learning , 2011, Soft Comput..

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  Xin Yao,et al.  An analysis of diversity measures , 2006, Machine Learning.

[12]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[13]  Juan José Rodríguez Diez,et al.  Classifier Ensembles with a Random Linear Oracle , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[15]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[16]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[17]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Lior Rokach,et al.  Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography , 2009, Comput. Stat. Data Anal..

[20]  Ravi Vaidyanathan,et al.  Diversity-based selection of components for fusion classifiers , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[21]  C. J. Whitaker,et al.  Ten measures of diversity in classifier ensembles: limits for two classifiers , 2001 .

[22]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[23]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[24]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[25]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.