A Normative Examination of Ensemble Learning Algorithms

Ensemble learning algorithms combine the results of several classifiers to yield an aggregate classification. We present a normative evaluation of combination methods, applying and extending existing axiomatizations from social choice theory and statistics. For the case of multiple classes, we show that several seemingly innocuous and desirable properties are mutually satisfied only by a dictatorship. A weaker set of properties admit only the weighted average combination rule. For the case of binary classification, we give axiomatic justifications for majority vote and for weighted majority. We also show that, even when all component algorithms report that an attribute is probabilistically independent of the classification, common ensemble algorithms often destroy this independence information. We exemplify these theoretical results with experiments on stock market data, demonstrating how ensembles of classifiers can exhibit canonical voting paradoxes.

[1]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[2]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[3]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[4]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[5]  Eric Horvitz,et al.  Social Choice Theory and Recommender Systems: Analysis of the Axiomatic Foundations of Collaborative Filtering , 2000, AAAI/IAAI.

[6]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[7]  Harris Drucker,et al.  Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[8]  K. Arrow Social Choice and Individual Values , 1951 .

[9]  Kenneth O. May,et al.  A Set of Independent Necessary and Sufficient Conditions for Simple Majority Decision , 1952 .

[10]  K. Arrow,et al.  Social Choice and Individual Values , 1951 .

[11]  C. Genest,et al.  Further evidence against independence preservation in expert judgement synthesis , 1987 .

[12]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[13]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Michael P. Wellman,et al.  Graphical Representations of Consensus Belief , 1999, UAI.

[15]  Venu Govindaraju,et al.  Serial classifier combination for handwritten word recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[16]  Christian Genest,et al.  Combining Probability Distributions: A Critique and an Annotated Bibliography , 1986 .

[17]  Sargur N. Srihari,et al.  A theory of classifier combination: the neural network approach , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[18]  Robert A. Jacobs,et al.  Methods For Combining Experts' Probability Assessments , 1995, Neural Computation.

[19]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[20]  Kevin Roberts,et al.  Interpersonal Comparability and Social Choice Theory , 1980 .

[21]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[22]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[23]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[24]  S. Coates AOAC Research Institute , 1996 .