An experimental study of one- and two-level classifier fusion for different sample sizes

Due to the wide variety of fusion techniques available for combining multiple classifiers into a more accurate classifier, a number of good studies have been devoted to determining in what situations some fusion methods should be preferred over other ones. However, the sample size behavior of the various fusion methods has hitherto received little attention in the literature of multiple classifier systems. The main contribution of this paper is thus to investigate the effect of training sample size on their relative performance and to gain more insight into the conditions for the superiority of some combination rules. A large experiment is conducted to study the performance of some fixed and trainable combination rules for executing one- and two-level classifier fusion for different training sample sizes. The experimental results yield the following conclusions: when implementing one-level fusion to combine homogeneous or heterogeneous base classifiers, fixed rules outperform trainable ones in nearly all cases, with only one exception of merging heterogeneous classifiers for large sample size. Moreover, the best classification for any considered sample size is generally achieved by a second level of combination (namely, utilizing one fusion rule to further combine a set of ensemble classifiers with each of them constructed by fusing base classifiers). Under these circumstances, it seems that adopting different types of fusion rules (fixed or trainable) as the combiners for two levels of fusion is appropriate.

[1]  Robert P. W. Duin,et al.  Classifier Conditional Posterior Probabilities , 1998, SSPR/SPR.

[2]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[3]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[5]  Robert P. W. Duin,et al.  Experiments with Classifier Combining Rules , 2000, Multiple Classifier Systems.

[6]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[7]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[8]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[9]  Josef Kittler,et al.  Experimental evaluation of expert fusion strategies , 1999, Pattern Recognit. Lett..

[10]  Sarunas Raudys Trainable fusion rules. II. Small sample-size effects , 2006, Neural Networks.

[11]  Lior Rokach,et al.  Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography , 2009, Comput. Stat. Data Anal..

[12]  Robert P. W. Duin,et al.  On Deriving the Second-Stage Training Set for Trainable Combiners , 2005, Multiple Classifier Systems.

[13]  Fabio Roli,et al.  Methods for dynamic classifier selection , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[14]  Chun-Xia Zhang,et al.  An Empirical Study of a Linear Regression Combiner on Multi-class Data Sets , 2009, MCS.

[15]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[16]  Josef Kittler,et al.  Sum Versus Vote Fusion in Multiple Classifier Systems , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Gian Luca Marcialis,et al.  An Experimental Comparison of Fixed and Trained Fusion Rules for Crisp Classifier Outputs , 2002, Multiple Classifier Systems.

[18]  Sarunas Raudys,et al.  Experts' Boasting in Trainable Fusion Rules , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  J. Sunil Rao,et al.  Non-parametric bootstrap ensembles for detection of tumor lesions , 2007, Pattern Recognit. Lett..

[20]  Anne M. P. Canuto,et al.  Investigating the influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles , 2007, Pattern Recognit. Lett..

[21]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[22]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Fuad Rahman,et al.  Multiple classifier decision combination strategies for character recognition: A review , 2003, Document Analysis and Recognition.

[24]  Xiuzhen Cheng,et al.  An asymptotic analysis of some expert fusion methods , 2001, Pattern Recognit. Lett..

[25]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[26]  Fabio Roli,et al.  Analysis of Linear and Order Statistics Combiners for Fusion of Imbalanced Classifiers , 2002, Multiple Classifier Systems.

[27]  Josef Kittler,et al.  Combining multiple classifiers by averaging or by multiplying? , 2000, Pattern Recognit..

[28]  Robert P. W. Duin,et al.  Bagging for linear classifiers , 1998, Pattern Recognit..

[29]  João B. D. Cabrera,et al.  On the impact of fusion strategies on classification errors for large ensembles of classifiers , 2006, Pattern Recognit..

[30]  Sarunas Raudys,et al.  Trainable fusion rules. I. Large sample size case , 2006, Neural Networks.

[31]  Rached Tourki,et al.  Problems in pattern classification in high dimensional spaces: behavior of a class of combined neuro-fuzzy classifiers , 2002, Fuzzy Sets Syst..

[32]  Luc Vandendorpe,et al.  Multiple classifier combination for face-based identity verification , 2004, Pattern Recognit..

[33]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[34]  Marcel J. T. Reinders,et al.  Sign Language Recognition by Combining Statistical DTW and Independent Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[36]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[37]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[38]  Ernest Valveny,et al.  Optimal Classifier Fusion in a Non-Bayesian Probabilistic Framework , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Gonzalo Martínez-Muñoz,et al.  Out-of-bag estimation of the optimal sample size in bagging , 2010, Pattern Recognit..

[41]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Ludmila I. Kuncheva,et al.  Switching between selection and fusion in combining classifiers: an experiment , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[43]  Lior Rokach,et al.  Troika - An improved stacking schema for classification tasks , 2009, Inf. Sci..

[44]  Robert P. W. Duin,et al.  Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix , 1998, Pattern Recognit. Lett..

[45]  Fabio Roli,et al.  A theoretical and experimental analysis of linear combiners for multiple classifier systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.