Using Information on Class Interrelations to Improve Classification of Multiclass Imbalanced Data: A New Resampling Algorithm

Abstract The relations between multiple imbalanced classes can be handled with a specialized approach which evaluates types of examples’ difficulty based on an analysis of the class distribution in the examples’ neighborhood, additionally exploiting information about the similarity of neighboring classes. In this paper, we demonstrate that such an approach can be implemented as a data preprocessing technique and that it can improve the performance of various classifiers on multiclass imbalanced datasets. It has led us to the introduction of a new resampling algorithm, called Similarity Oversampling and Undersampling Preprocessing (SOUP), which resamples examples according to their difficulty. Its experimental evaluation on real and artificial datasets has shown that it is competitive with the most popular decomposition ensembles and better than specialized preprocessing techniques for multi-imbalanced problems.

[1]  Bartosz Krawczyk,et al.  Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets , 2016, Pattern Recognit..

[2]  Jerzy Stefanowski,et al.  Multi-class and feature selection extensions of Roughly Balanced Bagging for imbalanced data , 2018, Journal of Intelligent Information Systems.

[3]  Gustavo E. A. P. A. Batista,et al.  Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior , 2004, MICAI.

[4]  Mateusz Lango,et al.  Tackling the Problem of Class Imbalance in Multi-class Sentiment Classification: An Experimental Study , 2019, Foundations of Computing and Decision Sciences.

[5]  José Salvador Sánchez,et al.  An Empirical Study of the Behavior of Classifiers on Imbalanced and Overlapped Data Sets , 2007, CIARP.

[6]  Szymon Wilk,et al.  Learning from Imbalanced Data in Presence of Noisy and Borderline Examples , 2010, RSCTC.

[7]  Pedro Antonio Gutiérrez,et al.  A dynamic over-sampling procedure based on sensitivity for multi-class problems , 2011, Pattern Recognit..

[8]  Francisco Herrera,et al.  Learning from Imbalanced Data Sets , 2018, Springer International Publishing.

[9]  Jerzy Stefanowski,et al.  Evaluating Difficulty of Multi-class Imbalanced Data , 2017, ISMIS.

[10]  Krzysztof Krawiec,et al.  Exploring complex and big data , 2017, Int. J. Appl. Math. Comput. Sci..

[11]  Mikel Galar,et al.  Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches , 2013, Knowl. Based Syst..

[12]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[13]  Sattar Hashemi,et al.  To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques , 2016, IEEE Transactions on Knowledge and Data Engineering.

[14]  Jerzy Stefanowski,et al.  Neighbourhood sampling in bagging for imbalanced data , 2015, Neurocomputing.

[15]  Francisco Herrera,et al.  An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes , 2011, Pattern Recognit..

[16]  Herna L. Viktor,et al.  SCUT: Multi-class imbalanced data classification using SMOTE and cluster-based undersampling , 2015, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K).

[17]  Jerzy Stefanowski,et al.  Dealing with Data Difficulty Factors While Learning from Imbalanced Data , 2016, Challenges in Computational Statistics and Data Mining.

[18]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[19]  Zhi-Hua Zhou,et al.  ON MULTI‐CLASS COST‐SENSITIVE LEARNING , 2006, Comput. Intell..

[20]  Yunqian Ma,et al.  Foundations of Imbalanced Learning , 2013 .

[21]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[22]  Xin Yao,et al.  Multiclass Imbalance Problems: Analysis and Potential Solutions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Jerzy Stefanowski,et al.  Overlapping, Rare Examples and Class Decomposition in Learning Classifiers from Imbalanced Data , 2013 .

[24]  Jerzy Stefanowski,et al.  Types of minority class examples and their influence on learning classifiers from imbalanced data , 2015, Journal of Intelligent Information Systems.

[25]  Jorma Laurikkala,et al.  Improving Identification of Difficult Small Classes by Balancing Class Distribution , 2001, AIME.