Predicting diabetes diseases using mixed data and supervised machine learning algorithms

Diabetes is considered as one of the deadliest and chronic diseases in several countries. All of them are working to prevent this disease at early stage by diagnosing and predicting the symptoms of diabetes using several methods. The motive of this study is to compare the performance of some Machine Learning algorithms, used to predict type 2 diabetes diseases. In this paper, we apply and evaluate four Machine Learning algorithms (Decision Tree, K-Nearest Neighbours, Artificial Neural Network and Deep Neural Network) to predict patients with or without type 2 diabetes mellitus. These techniques have been trained and tested on two diabetes databases: The first obtained from Frankfurt hospital (Germany), and the second is the well-known Pima Indian dataset. These datasets contain the same features composed of mixed data; risk factors and some clinical data. The performances of the experimented algorithms have been evaluated in both the cases i.e. dataset with noisy data (before pre-processing/some data with missing values) and dataset set without noisy data (after preprocessing). The results compared using different similarity metrics like Accuracy, Sensitivity, and Specificity gives best performance with respect to state of the art.

[1]  Ying Ju,et al.  Predicting Diabetes Mellitus With Machine Learning Techniques , 2018, Front. Genet..

[2]  Oumaima Terrada,et al.  Fuzzy cardiovascular diagnosis system using clinical data , 2018, 2018 4th International Conference on Optimization and Applications (ICOA).

[3]  Xuehui Meng,et al.  Comparison of three data mining models for predicting diabetes or prediabetes by risk factors , 2013, The Kaohsiung journal of medical sciences.

[4]  Randolph A. Miller,et al.  Review: Medical Diagnostic Decision Support Systems - Past, Present, And Future: A Threaded Bibliography and Brief Commentary , 1994, J. Am. Medical Informatics Assoc..

[5]  David West,et al.  Model selection for medical diagnosis decision support systems , 2004, Decis. Support Syst..

[6]  Oumaima Terrada,et al.  A fuzzy medical diagnostic support system for cardiovascular diseases diagnosis using risk factors , 2018, 2018 International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS).

[7]  H. Keen,et al.  Mortality and causes of death in the WHO multinational study of vascular disease in diabetes , 2001, Diabetologia.

[8]  J. Thakur,et al.  Prevalence and risk factors of diabetes in a community-based study in North India: the Chandigarh Urban Diabetes Study (CUDS). , 2011, Diabetes & metabolism.

[9]  Isaac Subirana,et al.  Risk of Cause-Specific Death in Individuals With Diabetes: A Competing Risks Analysis , 2016, Diabetes Care.

[10]  Dilip Singh Sisodia,et al.  Prediction of Diabetes using Classification Algorithms , 2018 .

[11]  S. Balamurali,et al.  Performance Analysis of Classifier Models to Predict Diabetes Mellitus , 2015 .

[12]  Antonella Santone,et al.  Diabetes Mellitus Affected Patients Classification and Diagnosis through Machine Learning Techniques , 2017, KES.

[13]  Alan D. Lopez,et al.  Comparative quantification of health risks. Global and regional burden of disease attributable to selected major risk factors. Volume 1. , 2004 .

[14]  Ibrahim Mohamed Ahmed Ali,et al.  Knowledge Acquisition for an Expert System for Diabetic , 2017, SCA.

[15]  Harleen Kaur,et al.  Predictive modelling and analytics for diabetes using a machine learning approach , 2020, Applied Computing and Informatics.

[16]  Oumaima Terrada,et al.  Classification and Prediction of atherosclerosis diseases using machine learning algorithms , 2019, 2019 5th International Conference on Optimization and Applications (ICOA).

[17]  J. Shaw,et al.  IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. , 2018, Diabetes research and clinical practice.