An Information Theoretic Perspective on Multiple Classifier Systems

This paper examines the benefits that information theory can bring to the study of multiple classifier systems. We discuss relationships between the mutual information and the classification error of a predictor. We proceed to discuss how this concerns ensemble systems, by showing a natural expansion of the ensemble mutual information into "accuracy" and "diversity" components. This natural derivation of a diversity term is an alternative to previous attempts to artificially define a term. The main finding is that diversity in fact exists at multiple orders of correlation, and pairwise diversity can capture only the low order components.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[2]  Ludmila I. Kuncheva,et al.  That Elusive Diversity in Classifier Ensembles , 2003, IbPRIA.

[3]  William J. McGill Multivariate information transmission , 1954, Trans. IRE Prof. Group Inf. Theory.

[4]  Joydeep Ghosh,et al.  Hierarchical Fusion of Multiple Classifiers for Hyperspectral Data Analysis , 2002, Pattern Analysis & Applications.

[5]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[6]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[7]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[8]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[9]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Robert Tibshirani,et al.  Margin Trees for High-dimensional Classification , 2007, J. Mach. Learn. Res..

[12]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[13]  Joydeep Ghosh,et al.  An Empirical Comparison of Hierarchical vs. Two-Level Approaches to Multiclass Problems , 2004, Multiple Classifier Systems.

[14]  Ling Li,et al.  Multiclass boosting with repartitioning , 2006, ICML.

[15]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[16]  Martin E. Hellman,et al.  Probability of error, equivocation, and the Chernoff bound , 1970, IEEE Trans. Inf. Theory.

[17]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[18]  A. Thomasian Review of 'Transmission of Information, A Statistical Theory of Communications' (Fano, R. M.; 1961) , 1962 .

[19]  Venkatesan Guruswami,et al.  Multiclass learning, boosting, and error-correcting codes , 1999, COLT '99.

[20]  Yoram Singer,et al.  Boosting Applied to Tagging and PP Attachment , 1999, EMNLP.

[21]  Jean-Philippe Thiran,et al.  Information Theoretic Combination of Classifiers with Application to AdaBoost , 2007, MCS.

[22]  Jian Li,et al.  Unifying multi-class AdaBoost algorithms with binary base learners under the margin framework , 2007, Pattern Recognit. Lett..

[23]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .