Abstention-SMOTE: An over-sampling approach for imbalanced data classification

In recent years, classification of imbalanced data has troubled most classification models because of the imbalanced class distribution. Synthetic Minority Oversampling Technique (SMOTE) is one of the solutions at data level, but this kind of method doesn't consider the distribution of the data set, thus the result is not satisfied. Based on the SMOTE method, this paper proposed an over-sampling method for imbalanced data classification, called Abstention-SMOTE. Firstly, we construct abstaining classifiers using ROC analysis. Then we use the abstaining classifiers to generate the abstention positive samples, which only includes the positive samples that are easily to be misclassified. Finally, we use these abstention positive samples to synthetize new positive samples to balance the data distribution. The experiment results indicate that our approach can achieve better results through comparing with three other over-sampling methods, i.e. RO-sampling, SMOTE and Borderline-SMOTE.

[1]  Xiaodong Yue,et al.  Tri-partition neighborhood covering reduction for robust classification , 2017, Int. J. Approx. Reason..

[2]  Huaxiang Zhang,et al.  RWO-Sampling: A random walk over-sampling approach to imbalanced data classification , 2014, Inf. Fusion.

[3]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[4]  Tadeusz Pietraszek,et al.  On the use of ROC analysis for the optimization of abstaining classifiers , 2007, Machine Learning.

[5]  Tom Fawcett,et al.  Robust Classification Systems for Imprecise Environments , 1998, AAAI/IAAI.

[6]  Qiang Yang,et al.  Decision trees with minimal costs , 2004, ICML.

[7]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[8]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[9]  Mohamed Bekkar,et al.  Imbalanced Data Learning Approaches Review , 2013 .

[10]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[11]  Yuehui Chen,et al.  A new approach for imbalanced data classification based on data gravitation , 2014, Inf. Sci..

[12]  Zulaiha Ali Othman,et al.  New approach with ensemble method to address class imbalance problem , 2015 .

[13]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[14]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..