Adaptive Synthetic-Nominal (ADASYN-N) and Adaptive Synthetic-KNN (ADASYN-KNN) for Multiclass Imbalance Learning on Laboratory Test Data

Annually about 1,500 cases of cervical cancer are found in Indonesia, which made Indonesia as the country with the highest number of cervical cancer cases in the world. Cervical cancer screening and HPV testing are done with a Pap smear test. However, this examination requires a lot of time, costly and highly susceptible bias of the observer during the process of investigation and analysis. To overcome these problems, several studies have modeled the machine learning with a variety of approaches have been made. However, these studies are constrained by the limitation of the data amounts and the imbalanced data that caused by the different ratio of each case. This can lead to errors in the classification of the minority due to the tendency of the classification results that focus on the majority class. This study addressed the handling imbalance data on classification of cases Pap test results using the method of over-sampling. ADASYN-N and ADASYN-KNN algorithms were proposed as a development of ADASYN algorithm to handle datasets with nominal data types. This study included SMOTE-N algorithm to deal with the problem as comparison algorithm. As the results, ADASYN-KNN with the preference “0” gave the highest accuracy, precision, recall, and f-score of 95.38%; 95.583%; 95.383%; and 95.283%. The highest ROC area value was obtained with the ADASYN-KNN with preference “1” of 99.183%.

[1]  Upi Rianantika Implementasi Metode Similarity Untuk Pendukung Keputusan Diagnosis Kanker Serviks , 2013 .

[2]  Chi-Jie Lu,et al.  Prediction of Recurrence in Patients with Cervical Cancer Using MARS and Classification , 2022 .

[3]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[4]  José Salvador Sánchez,et al.  On the effectiveness of preprocessing methods when dealing with different levels of class imbalance , 2012, Knowl. Based Syst..

[5]  Amir Hussain,et al.  Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study , 2016, IEEE Access.

[6]  Florin Gorunescu,et al.  Data Mining - Concepts, Models and Techniques , 2011, Intelligent Systems Reference Library.

[7]  Min Chen,et al.  Deep Learning with MCA-based Instance Selection and Bootstrapping for Imbalanced Data Classification , 2015, 2015 IEEE Conference on Collaboration and Internet Computing (CIC).

[8]  Anastasios Glaros Data-driven Definition of Cell Types Based on Single-cell Gene Expression Data , 2016 .

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Adhistya Erna Permanasari,et al.  Comparative study on data mining classification methods for cervical cancer prediction using pap smear results , 2016, 2016 1st International Conference on Biomedical Engineering (IBIOMED).

[11]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[12]  Reza Shoorangiz,et al.  Prediction of microsleeps from EEG: Preliminary results , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[13]  Malay Kumar Kundu,et al.  Automated classification of Pap smear images to detect cervical dysplasia , 2017, Comput. Methods Programs Biomed..

[14]  Adiwijaya,et al.  Handling imbalanced data in churn prediction using ADASYN and backpropagation algorithm , 2017, 2017 3rd International Conference on Science in Information Technology (ICSITech).