Supervised Machine Learning Models for Prediction of COVID-19 Infection using Epidemiology Dataset

COVID-19 or 2019-nCoV is no longer pandemic but rather endemic, with more than 651,247 people around world having lost their lives after contracting the disease. Currently, there is no specific treatment or cure for COVID-19, and thus living with the disease and its symptoms is inevitable. This reality has placed a massive burden on limited healthcare systems worldwide especially in the developing nations. Although neither an effective, clinically proven antiviral agents' strategy nor an approved vaccine exist to eradicate the COVID-19 pandemic, there are alternatives that may reduce the huge burden on not only limited healthcare systems but also the economic sector; the most promising include harnessing non-clinical techniques such as machine learning, data mining, deep learning and other artificial intelligence. These alternatives would facilitate diagnosis and prognosis for 2019-nCoV pandemic patients. Supervised machine learning models for COVID-19 infection were developed in this work with learning algorithms which include logistic regression, decision tree, support vector machine, naive Bayes, and artificial neutral network using epidemiology labeled dataset for positive and negative COVID-19 cases of Mexico. The correlation coefficient analysis between various dependent and independent features was carried out to determine a strength relationship between each dependent feature and independent feature of the dataset prior to developing the models. The 80% of the training dataset were used for training the models while the remaining 20% were used for testing the models. The result of the performance evaluation of the models showed that decision tree model has the highest accuracy of 94.99% while the Support Vector Machine Model has the highest sensitivity of 93.34% and Naïve Bayes Model has the highest specificity of 94.30%.

[1]  Pramod Singh Learn PySpark: Build Python-based Machine Learning and Deep Learning Models , 2019 .

[2]  H. Asadi,et al.  Machine Learning for Outcome Prediction of Acute Ischemic Stroke Post Intra-Arterial Therapy , 2014, PloS one.

[3]  Mansir Abubakar,et al.  Performance Evaluation of Classification Data Mining Algorithms on Coronary Artery Disease Dataset , 2019, 2019 9th International Conference on Computer and Knowledge Engineering (ICCKE).

[4]  Sadiq Hussain,et al.  Performance Evaluation of Various Data Mining Algorithms on Road Traffic Accident Dataset , 2018, Information and Communication Technology for Intelligent Systems.

[5]  Peter Daszak,et al.  Escaping Pandora's Box - Another Novel Coronavirus. , 2020, The New England journal of medicine.

[6]  Tony Jebara,et al.  Machine Learning: Discriminative and Generative , 2012 .

[7]  Nivedita Manohar Mathkunti,et al.  Machine Learning Techniques to Identify Dementia , 2020, SN Computer Science.

[8]  Lisa E. Gralinski,et al.  Return of the Coronavirus: 2019-nCoV , 2020, Viruses.

[9]  L. J. Muhammad,et al.  Data Mining Driven Models for Diagnosis of Diabetes Mellitus: A Survey , 2018, Indian Journal of Science and Technology.

[10]  Y. Hu,et al.  Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China , 2020, The Lancet.

[11]  Ali Narin,et al.  Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks , 2020, Pattern Analysis and Applications.

[12]  Paul Sajda,et al.  Machine learning for detection and diagnosis of disease. , 2006, Annual review of biomedical engineering.

[13]  David L. Erickson,et al.  A Well-Resolved Phylogeny of the Trees of Puerto Rico Based on DNA Barcode Sequence Data , 2014, PloS one.

[14]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[15]  L. J. Muhammad,et al.  On the Problems of Knowledge Acquisition and Representation of Expert System for Diagnosis of Coronary Artery Disease (CAD) , 2018, International Journal of u- and e- Service, Science and Technology.

[16]  Harleen Kaur,et al.  Predictive modelling and analytics for diabetes using a machine learning approach , 2020, Applied Computing and Informatics.

[17]  Ibrahim Said Ahmad,et al.  A Survey on Machine Learning Techniques in Movie Revenue Prediction , 2020, SN Computer Science.

[18]  G. Heinze,et al.  Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal , 2020, BMJ.

[19]  Daniel R. Schrider,et al.  Supervised Machine Learning for Population Genetics: A New Paradigm , 2018, Trends in genetics : TIG.

[20]  Muhammad Lawan Jibril,et al.  Predictive Supervised Machine Learning Models for Diabetes Mellitus , 2020, SN Comput. Sci..

[21]  Li Yan,et al.  A machine learning-based model for survival prediction in patients with severe COVID-19 infection , 2020, medRxiv.

[22]  Mei U Wong,et al.  COVID-19 Coronavirus Vaccine Design Using Reverse Vaccinology and Machine Learning , 2020, bioRxiv.

[23]  Low Tan Jung,et al.  An Improved C4.5 Data Mining Driven Algorithm for the Diagnosis of Coronary Artery Disease , 2019, 2019 International Conference on Digitization (ICD).

[24]  Malik Magdon-Ismail Machine Learning the Phenomenology of COVID-19 From Early Infection Dynamics , 2020, medRxiv.

[25]  W. Liang,et al.  Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions , 2020, Journal of thoracic disease.

[26]  Safial Islam Ayon,et al.  Wearable Technology to Assist the Patients Infected with Novel Coronavirus (COVID-19) , 2020, SN Comput. Sci..

[27]  Pramod Singh Supervised Machine Learning , 2019 .

[28]  Ibrahim A. Mohammed,et al.  Multi Query Optimization Algorithm Using Semantic and Heuristic Approaches , 2016 .

[29]  Jianjun Gao,et al.  Discovering drugs to treat coronavirus disease 2019 (COVID-19). , 2020, Drug discoveries & therapeutics.

[30]  Sani Salisu,et al.  Using Decision Tree Data Mining Algorithm to Predict Causes of Road Traffic Accidents, its Prone Locations and Time along Kano –Wudil Highway , 2017 .

[31]  Sharareh R Niakan Kalhori,et al.  Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study , 2020, JMIR Public Health and Surveillance.

[32]  Elisabeth Mahase,et al.  China coronavirus: what do we know so far? , 2020, BMJ.

[33]  Adrien P. Genoud,et al.  A comparison of supervised machine learning algorithms for mosquito identification from backscattered optical signals , 2020, Ecol. Informatics.

[34]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[35]  Gyu Sang Choi,et al.  COVID-19 Future Forecasting Using Supervised Machine Learning Models , 2020, IEEE Access.

[36]  Mohammed Elmusrati,et al.  Comparison of supervised machine learning classification techniques in prediction of locoregional recurrences in early oral tongue cancer , 2019, Int. J. Medical Informatics.

[37]  L. J. Muhammad,et al.  Fuzzy based Expert System for Diagnosis of Diabetes Mellitus , 2021 .

[38]  Gurjit S. Randhawa,et al.  Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study , 2020, bioRxiv.

[39]  Yan Zhao,et al.  Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China. , 2020, JAMA.

[40]  Alimuddin Zumla Mandell, Douglas, and Bennett's principles and practice of infectious diseases , 2010, The Lancet Infectious Diseases.

[41]  Muhammad Lawan Jibril,et al.  Power of Artificial Intelligence to Diagnose and Prevent Further COVID-19 Outbreak: A Short Communication , 2020, ArXiv.

[42]  Gurjit S. Randhawa,et al.  Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study , 2020, PloS one.

[43]  Safial Islam Ayon,et al.  Predictive Data Mining Models for Novel Coronavirus (COVID-19) Infected Patients’ Recovery , 2020, SN Comput. Sci..

[44]  L. J. Muhammad,et al.  Security Challenges for Building Knowledge-Based Economy in Nigeria , 2015 .

[45]  Dianbo Liu,et al.  A machine learning methodology for real-time forecasting of the 2019-2020 COVID-19 outbreak using Internet searches, news alerts, and estimates from mechanistic models , 2020, ArXiv.

[46]  R. Saravanan,et al.  A State of Art Techniques on Machine Learning Algorithms: A Perspective of Supervised Learning Approaches in Data Classification , 2018, 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS).