Learning the areas of expertise of classifiers in an ensemble

Abstract There are various machine learning algorithms for extracting patterns from data; but recently, decision combination has become popular to improve accuracy over single learner systems. The fundamental idea behind combining the decisions of an ensemble of classifiers is that different classifiers most probably misclassify different patterns and by suitably combining the decisions of complementary classifiers, accuracy can be improved. In this paper, we investigate two kinds of classifier systems which are capable of estimating how much to weight each base classifier dynamically; during the calculation of the overall output for a given test data instance: (1) In ‘referee-based system’, a referee is associated with each classifier which learns the area of expertise of its associated classifier and weights it accordingly. (2) However, ‘gating system’ learns to partition the input space among all classifiers. Each referee in referee-based system learns a two-class problem (i.e., whether to use or not to use a classifier) whereas a gating system learns an L-class problem assigning the input to one of L base classifiers. Our analysis on 20 datasets from different domains and a classifier pool including 21 base learning algorithms reveals that the gating system tends to concentrate on a few of the base classifiers whereas a use of referees leads to a more balanced use of the base classifiers. Moreover, in the case of referees, it is better to use a small subset of base classifiers, instead of a single one. The study shows that, by using well-trained selection unit (referee or gating), we can get as high accuracy as using all the base classifiers (to combine their decisions) with drastic decrease in the number of base classifiers used, and also improve accuracy. The improvement is significant especially in cases when none of the base classifiers has high accuracy and it indicates that selecting classifiers appears promising as a means of solving hard learning problems.

[1]  Ludmila I. Kuncheva,et al.  Clustering-and-selection model for classifier combination , 2000, KES'2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516).

[2]  Fabio Roli,et al.  Methods for dynamic classifier selection , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[3]  Robert Sabourin,et al.  From dynamic classifier selection to dynamic ensemble selection , 2008, Pattern Recognit..

[4]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Bogdan Gabrys,et al.  Classifier selection for majority voting , 2005, Inf. Fusion.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Geoffrey E. Hinton,et al.  The delve manual , 1996 .

[8]  Nojun Kwak,et al.  Feature extraction for classification problems and its application to face recognition , 2008, Pattern Recognit..

[9]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[10]  William Nick Street,et al.  Ensemble Pruning Via Semi-definite Programming , 2006, J. Mach. Learn. Res..

[11]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[12]  Shlomo Argamon,et al.  Arbitrating Among Competing Classifiers Using Learned Referees , 2001, Knowledge and Information Systems.

[13]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[15]  Ludmila I. Kuncheva,et al.  Switching between selection and fusion in combining classifiers: an experiment , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[16]  Mohamed S. Kamel,et al.  Adaptive fusion and co-operative training for classifier ensembles , 2006, Pattern Recognit..

[17]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..