A weighted voting framework for classifiers ensembles

We propose a probabilistic framework for classifier combination, which gives rigorous optimality conditions (minimum classification error) for four combination methods: majority vote, weighted majority vote, recall combiner and the naive Bayes combiner. The framework is based on two assumptions: class-conditional independence of the classifier outputs and an assumption about the individual accuracies. The four combiners are derived subsequently from one another, by progressively relaxing and then eliminating the second assumption. In parallel, the number of the trainable parameters increases from one combiner to the next. Simulation studies reveal that if the parameter estimates are accurate and the first assumption is satisfied, the order of preference of the combiners is: naive Bayes, recall, weighted majority and majority. By inducing label noise, we expose a caveat coming from the stability-plasticity dilemma. Experimental results with 73 benchmark data sets reveal that there is no definitive best combiner among the four candidates, giving a slight preference to naive Bayes. This combiner was better for problems with a large number of fairly balanced classes while weighted majority vote was better for problems with a small number of unbalanced classes.

[1]  Robert P. W. Duin,et al.  The combining classifier: to train or not to train? , 2002, Object recognition supported by user interaction for service robots.

[2]  Giorgio Valentini,et al.  Ensemble methods : a review , 2012 .

[3]  Fabio Roli,et al.  Fusion of multiple classifiers for intrusion detection in computer networks , 2003, Pattern Recognit. Lett..

[4]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[5]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[6]  Franco Turini,et al.  Stream mining: a novel architecture for ensemble-based classification , 2011, Knowledge and Information Systems.

[7]  Amanda J. C. Sharkey,et al.  On Combining Artificial Neural Nets , 1996, Connect. Sci..

[8]  Ching Y. Suen,et al.  Optimal combinations of pattern classifiers , 1995, Pattern Recognit. Lett..

[9]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[10]  Robert P. W. Duin,et al.  Comparison Between Product and Mean Classi er Combination Rules , 2009 .

[11]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[12]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[13]  Yew Seng Ng,et al.  Evaluation of decision fusion strategies for effective collaboration among heterogeneous fault diagnostic methods , 2011, Comput. Chem. Eng..

[14]  Ching Y. Suen,et al.  Application of majority voting to pattern recognition: an analysis of its behavior and performance , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[15]  Fabio Roli,et al.  A theoretical and experimental analysis of linear combiners for multiple classifier systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Günther Eibl,et al.  Multiclass Boosting for Weak Classifiers , 2005, J. Mach. Learn. Res..

[17]  Chun-Xia Zhang,et al.  An experimental study of one- and two-level classifier fusion for different sample sizes , 2011, Pattern Recognit. Lett..

[18]  Robert P. W. Duin,et al.  Limits on the majority vote accuracy in classifier fusion , 2003, Pattern Analysis & Applications.

[19]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[20]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[21]  L. Shapley,et al.  Optimizing group judgmental accuracy in the presence of interdependencies , 1984 .

[22]  L. Kuncheva ‘ Fuzzy ’ vs ‘ Non-fuzzy ’ in Combining Classifiers Designed by Boosting , 2003 .

[23]  Li Zhang,et al.  Sparse ensembles using weighted combination methods based on linear programming , 2011, Pattern Recognit..

[24]  Geoff Holmes,et al.  Scalable and efficient multi-label classification for evolving data streams , 2012, Machine Learning.

[25]  Johannes Wagner,et al.  A systematic discussion of fusion techniques for multi-modal affect recognition tasks , 2011, ICMI '11.

[26]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Ludmila I. Kuncheva,et al.  "Fuzzy" versus "nonfuzzy" in combining classifiers designed by Boosting , 2003, IEEE Trans. Fuzzy Syst..

[28]  Kagan Tumer,et al.  Linear and Order Statistics Combiners for Pattern Classification , 1999, ArXiv.

[29]  Hongshik Ahn,et al.  A weight-adjusted voting algorithm for ensembles of classifiers , 2011 .

[30]  Adam Weintrit,et al.  Methods and Algorithms , 2011 .

[31]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  D. Obradovic,et al.  Combining Artificial Neural Nets , 1999, Perspectives in Neural Computing.

[33]  Steven J. Simske,et al.  Performance analysis of pattern classifier combination by plurality voting , 2003, Pattern Recognit. Lett..