Imbalanced Data Classification Algorithm Based on Integrated Sampling and Ensemble Learning

In order to alleviate the impact of imbalanced data on support vector machine (SVM), an integrated hybrid sampling imbalanced data classification method is proposed. First, the imbalance rate of imbalanced data is reduced by the ADASYN-NCL (Adaptive Synthetic Sampling Technique—Domain Cleanup Rule Downsampling Method) hybrid sampling method. Then, the AdaBoost algorithm framework is used to give different weight adjustments to the misclassification of minority and majority classes, and selectively integrate several classifiers to obtain better classification. Finally, use the 10 sets of imbalanced data in the KEEL database as test objects, and F-value and G-mean are used as evaluation indicators to verify the performance of the classification algorithm. The experimental results show that the classification algorithm has certain advantages for the classification effect of imbalanced data sets.

[1]  Edward Y. Chang,et al.  Class-Boundary Alignment for Imbalanced Dataset Learning , 2003 .

[2]  Jianwen Xiang,et al.  Software Defect Prediction Based on Conditional Random Field in Imbalance Distribution , 2015, 2015 2nd International Symposium on Dependable Computing and Internet of Things (DCIT).

[3]  Xiaolin Chen,et al.  An adaptive Cost-sensitive Classifier , 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE).

[4]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[5]  Wei Liu,et al.  A Cost-Sensitive Learning Strategy for Feature Extraction from Imbalanced Data , 2016, ICONIP.

[6]  Udhav Bhosle,et al.  A Study of Mammogram Classification using AdaBoost with Decision Tree, KNN, SVM and Hybrid SVM-KNN as Component Classifiers , 2018, J. Inf. Hiding Multim. Signal Process..

[7]  Dae-Ki Kang,et al.  Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction , 2015, Expert Syst. Appl..

[8]  Giorgio Valentini,et al.  Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods , 2004, J. Mach. Learn. Res..

[9]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[10]  Nathalie Japkowicz,et al.  Boosting support vector machines for imbalanced data sets , 2008, Knowledge and Information Systems.