Minority Oversampling Using Sensitivity

The Synthetic Minority Oversampling Technique (SMOTE) is effective to handle imbalance classification problems. However, the random candidate selection of SMOTE may lead to severe overlap between classes and introduce new noise factors. Many variants of SMOTE have been proposed to relieve these problems by generating new examples in safe regions. Most of these methods generate new examples with existing minority examples without considering the negative impact that class imbalance have brought on these examples. In this paper, we handle the imbalance classification using Bayes’ decision rule and propose a novel oversampling method, the Minority Oversampling using Sensitivity (MOSS). Candidates for new example generations are selected considering their sensitivity with respect to class imbalance. New examples are then generated by interpolating the candidate and one of its adjacent examples. Experiments on 30 datasets confirm the superiority of the MOSS against one baseline method and seven oversampling methods.

[1]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[2]  Germano C. Vasconcelos,et al.  Boosting the performance of over-sampling algorithms through under-sampling the minority class , 2019, Neurocomputing.

[3]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[4]  Wing W. Y. Ng,et al.  A robust correlation analysis framework for imbalanced and dichotomous data with uncertainty , 2019, Inf. Sci..

[5]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[6]  Daniel S. Yeung,et al.  Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems , 2015, IEEE Transactions on Cybernetics.

[7]  Fernando Bação,et al.  Oversampling for Imbalanced Learning Based on K-Means and SMOTE , 2017, Inf. Sci..

[8]  Witold Pedrycz,et al.  Dual autoencoders features for imbalance classification problem , 2016, Pattern Recognit..

[9]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[10]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[11]  Gary M. Weiss,et al.  Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs? , 2007, DMIN.

[12]  Luís Torgo,et al.  A Survey of Predictive Modeling on Imbalanced Domains , 2016, ACM Comput. Surv..

[13]  Wing W. Y. Ng,et al.  Stochastic Sensitivity Measure-Based Noise Filtering and Oversampling Method for Imbalanced Classification Problems , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[14]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[15]  Xi Zhang,et al.  CGMOS: Certainty Guided Minority OverSampling , 2016, CIKM.

[16]  Gianluca Bontempi,et al.  When is Undersampling Effective in Unbalanced Classification Tasks? , 2015, ECML/PKDD.

[17]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[18]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[19]  Chris D. Nugent,et al.  Undersampling Near Decision Boundary for Imbalance Problems , 2019, 2019 International Conference on Machine Learning and Cybernetics (ICMLC).

[20]  Witold Pedrycz,et al.  Cost-Sensitive Weighting and Imbalance-Reversed Bagging for Streaming Imbalanced and Concept Drifting in Electricity Pricing Classification , 2019, IEEE Transactions on Industrial Informatics.

[21]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[22]  William A. Rivera Noise Reduction A Priori Synthetic Over-Sampling for class imbalanced data sets , 2017, Inf. Sci..

[23]  Osmar R. Zaïane,et al.  Synthetic Oversampling with the Majority Class: A New Perspective on Handling Extreme Imbalance , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[24]  María José del Jesús,et al.  KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining , 2017, Int. J. Comput. Intell. Syst..

[25]  Huaxiang Zhang,et al.  RWO-Sampling: A random walk over-sampling approach to imbalanced data classification , 2014, Inf. Fusion.

[26]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..