A multi-objective optimisation approach for class imbalance learning

Class imbalance limits the performance of most learning algorithms since they cannot cope with large differences between the number of samples in each class, resulting in a low predictive accuracy over the minority class. In this respect, several papers proposed algorithms aiming at achieving more balanced performance. However, balancing the recognition accuracies for each class very often harms the global accuracy. Indeed, in these cases the accuracy over the minority class increases while the accuracy over the majority one decreases. This paper proposes an approach to overcome this limitation: for each classification act, it chooses between the output of a classifier trained on the original skewed distribution and the output of a classifier trained according to a learning method addressing the course of imbalanced data. This choice is driven by a parameter whose value maximizes, on a validation set, two objective functions, i.e. the global accuracy and the accuracies for each class. A series of experiments on ten public datasets with different proportions between the majority and minority classes show that the proposed approach provides more balanced recognition accuracies than classifiers trained according to traditional learning methods for imbalanced data as well as larger global accuracy than classifiers trained on the original skewed distribution.

[1]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[2]  Jürgen Moser Fritz John Collected Papers , 1985 .

[3]  Fabio Roli,et al.  Support Vector Machines with Embedded Reject Option , 2002, SVM.

[4]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Applying One-Sided Selection to Unbalanced Datasets , 2000, MICAI.

[5]  Moninder Singh,et al.  Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management , 1996, ICML.

[6]  Panayiotis E. Pintelas,et al.  Mixture of Expert Agents for Handling Imbalanced Data Sets , 2003 .

[7]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[9]  José Salvador Sánchez,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[10]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[11]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[12]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[13]  Ying He,et al.  MSMOTE: Improving Classification Performance When Training Data is Imbalanced , 2009, 2009 Second International Workshop on Computer Science and Engineering.

[14]  Paolo Soda An experimental comparison of MES aggregation rules in case of imbalanced datasets , 2009, 2009 22nd IEEE International Symposium on Computer-Based Medical Systems.

[15]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Mario Molinara,et al.  Facing Imbalanced Classes through Aggregation of Classifiers , 2007, 14th International Conference on Image Analysis and Processing (ICIAP 2007).

[17]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[18]  Harold W. Kuhn,et al.  Nonlinear programming: a historical view , 1982, SMAP.

[19]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[20]  Mario Vento,et al.  On Rejecting Unreliably Classified Patterns , 2007, MCS.

[21]  Rosa Maria Valdovinos,et al.  New Applications of Ensembles of Classifiers , 2003, Pattern Analysis & Applications.

[22]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[23]  Mario Vento,et al.  Reliability Parameters to Improve Combination Strategies in Multi-Expert Systems , 1999, Pattern Analysis & Applications.

[24]  Nathalie Japkowicz,et al.  A Recognition-Based Alternative to Discrimination-Based Multi-layer Perceptrons , 2000, Canadian Conference on AI.

[25]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[26]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[27]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.