Dual Approach to Handling Imbalanced Class in Datasets Using Oversampling and Ensemble Learning Techniques

In the field of machine learning, the existence of class imbalances in the dataset will make the resulting model have less than optimal performance. Theoretically, the single classifier has a weakness for class imbalance conditions in the datasets because of the majority of single classifiers tend to work by recognizing patterns in the majority class the datasets are not balanced. So, the performance cannot be maximized. In this study, two approaches were introduced to deal with class imbalance conditions in the dataset. The first approach uses ADASYN as resampling while the second approach uses the Stacking algorithm as meta-learning. After conducting a test using 5 datasets with different imbalanced ratios, it shows that the proposed method produced the highest g-mean and AUC score compared to the other classification algorithms. The proposed method in this study is the stacking algorithm between the SVM and Random Forest algorithms and the addition of ADASYN in the resampling process. Hence, the proposed method can be a solution for handling class imbalance in datasets. However, this study has limitations such as the dataset used is a dataset with a binary class category. For this reason, for the future work, testing will be suggested using the imbalanced class dataset with the multiclass datasets.

[1]  Qiangwang A Hybrid Sampling SVM Approach to Imbalanced Data Classification , 2014 .

[2]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  Peng Zhihao,et al.  Comparison of the Different Sampling Techniques for Imbalanced Classification Problems in Machine Learning , 2019, 2019 11th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA).

[5]  Rashedur M. Rahman,et al.  Improving accuracy of students’ final grade prediction model using optimal equal width binning and synthetic minority over-sampling technique , 2015, Decis. Anal..

[6]  Rashedur M. Rahman,et al.  Data mining approaches to predict final grade by overcoming class imbalance problem , 2014, 2014 17th International Conference on Computer and Information Technology (ICCIT).

[7]  Zhiwu Huang,et al.  Ensemble Strategy for Hard Classifying Samples in Class-Imbalanced Data Set , 2018, 2018 IEEE International Conference on Big Data and Smart Computing (BigComp).

[8]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[9]  Agus Tri Wibowo Implementasi Algoritma Deteksi Spam Yang Tersisipi Informasi Citra Dengan Metode Svm Dan Random Forest , 2016 .

[10]  Phayung Meesad,et al.  A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition , 2014, Expert Syst. Appl..

[11]  Sajid Ahmed,et al.  Hybrid Methods for Class Imbalance Learning Employing Bagging with Sampling Techniques , 2017, 2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS).

[12]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[13]  Long Zhang,et al.  Application of the Hybrid SVM-KNN Model for Credit Scoring , 2013, 2013 Ninth International Conference on Computational Intelligence and Security.

[14]  Jane You,et al.  Hybrid Classifier Ensemble for Imbalanced Data , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Siti Mariyam Shamsuddin,et al.  Handling Imbalanced Ratio for Class Imbalance Problem Using SMOTE , 2019, Proceedings of the Third International Conference on Computing, Mathematics and Statistics (iCMS2017).

[16]  Huaikou Miao,et al.  Classification of wine quality with imbalanced data , 2016, 2016 IEEE International Conference on Industrial Technology (ICIT).

[17]  Ioannis A. Kakadiaris,et al.  NEATER: filtering of over-sampled data using non-cooperative game theory , 2014, Soft Computing.