An experimental comparison of MES aggregation rules in case of imbalanced datasets

Learning under imbalanced dataset can be difficult since traditional algorithms are biased towards the majority class, providing low predictive accuracy over the minority one. Among the several methods proposed in the literature to overcome such a limitation, the most recent uses multi-experts system (MES) composed of balanced classifiers, whose decisions are aggregated according to a combination rule. Each classifier of the MES is trained with a balanced subset of the original training set, which can be determined applying different division methods. This paper explores how different MES combination rules perform with imbalanced TS, experimentally comparing fusion and selection combination criteria. Furthermore, two methods have been used to divide the original TS, namely random selection and clustering. The results confirm and extend previous findings reported in the literature showing that, on the one side, combination rules belonging to selection framework outperform the others and, on the other side, dividing the original training set via random selection rather than clustering permits to attain better performance.

[1]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[2]  Mario Molinara,et al.  Facing Imbalanced Classes through Aggregation of Classifiers , 2007, 14th International Conference on Image Analysis and Processing (ICIAP 2007).

[3]  Mario Vento,et al.  On Rejecting Unreliably Classified Patterns , 2007, MCS.

[4]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[5]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[6]  Ludmila I. Kuncheva,et al.  Clustering-and-selection model for classifier combination , 2000, KES'2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516).

[7]  Mario Vento,et al.  Reliability Parameters to Improve Combination Strategies in Multi-Expert Systems , 1999, Pattern Analysis & Applications.

[8]  Fabio Roli,et al.  Support Vector Machines with Embedded Reject Option , 2002, SVM.

[9]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[11]  Rosa Maria Valdovinos,et al.  New Applications of Ensembles of Classifiers , 2003, Pattern Analysis & Applications.

[12]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[14]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[15]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[16]  Ching Y. Suen,et al.  Multiple Classifier Combination Methodologies for Different Output Levels , 2000, Multiple Classifier Systems.

[17]  Panayiotis E. Pintelas,et al.  Mixture of Expert Agents for Handling Imbalanced Data Sets , 2003 .

[18]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.