Selecting Accurate Classifier Models for a MERS-CoV Dataset

The Middle East Respiratory Syndrome Coronavirus (MERS-CoV) is a viral respiratory disease that is spreading worldwide necessitating to have an accurate diagnosis system that accurately predicts infections. As data mining classifiers can greatly assist in enhancing the prediction accuracy of diseases in general. In this paper, classifier model performance for two classification types: (1) binary and (2) multi-class were tested on a MERS-CoV dataset that consists of all reported cases in Saudi Arabia between 2013 and 2017. A cross-validation model was applied to measure the accuracy of the Support Vector Machine (SVM), Decision Tree, and k-Nearest Neighbor (k-NN) classifiers. Experimental results demonstrate that SVM and Decision Tree classifiers achieved the highest accuracy of 86.44% for binary classification based on healthcare personnel class. On the other hand, for multiclass classification based on city class, the decision tree classifier had the highest accuracy among the remaining classifiers; although it did not reach a satisfactory accuracy level (42.80%). This work is intended to be a part of a MERS-CoV prediction system to enhance the diagnosis of MERS-CoV disease.

[1]  Taeseon Yoon,et al.  Analysis of transmission route of MERS coronavirus using decision tree and Apriori algorithm , 2016, 2016 18th International Conference on Advanced Communication Technology (ICACT).

[2]  Sandeep K. Sood,et al.  An intelligent system for predicting and preventing MERS-CoV infection outbreak , 2015, The Journal of Supercomputing.

[3]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[4]  Hajar Mousannif,et al.  Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis , 2016, ANT/SEIT.

[5]  Stephen V. Stehman,et al.  Selecting and interpreting measures of thematic classification accuracy , 1997 .

[6]  H. Koh,et al.  Data mining applications in healthcare. , 2005, Journal of healthcare information management : JHIM.

[7]  Isra Al-Turaiki,et al.  Building predictive models for MERS-CoV infections using data mining techniques , 2016, Journal of Infection and Public Health.

[8]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[9]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[10]  Jing Li,et al.  A Comparative Study on Machine Classification Model in Lung Cancer Cases Analysis , 2016 .

[11]  S. Seema,et al.  Predictive analytics to prevent and control chronic diseases , 2016, 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT).

[12]  Taeseon Yoon,et al.  Comparison between SARS CoV and MERS CoV Using Apriori Algorithm, Decision Tree, SVM , 2016 .

[13]  Geethamma Jolly,et al.  Middle East Respiratory Syndrome Coronavirus (MERS-CoV) , 2016 .

[14]  Heba Kurdia,et al.  Identifying accurate classifier models for a text-based MERS-CoV dataset , 2017, 2017 Intelligent Systems Conference (IntelliSys).

[15]  Abílio Oliveira,et al.  Applying data mining techniques to improve diagnosis in neonatal jaundice , 2012, BMC Medical Informatics and Decision Making.

[16]  Tahani Daghistani,et al.  Diagnosis of Diabetes by Applying Data Mining Classification Techniques , 2016 .

[17]  Illhoi Yoo,et al.  Data Mining in Healthcare and Biomedicine: A Survey of the Literature , 2012, Journal of Medical Systems.

[18]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[19]  Ayush Singhal,et al.  MobDBTest: A machine learning based system for predicting diabetes risk using mobile devices , 2015, 2015 IEEE International Advance Computing Conference (IACC).