Classification and prediction of diabetes disease using machine learning paradigm

Background and objectivesDiabetes is a chronic disease characterized by high blood sugar. It may cause many complicated disease like stroke, kidney failure, heart attack, etc. About 422 million people were affected by diabetes disease in worldwide in 2014. The figure will be reached 642 million in 2040. The main objective of this study is to develop a machine learning (ML)-based system for predicting diabetic patients.Materials and methodsLogistic regression (LR) is used to identify the risk factors for diabetes disease based on p value and odds ratio (OR). We have adopted four classifiers like naïve Bayes (NB), decision tree (DT), Adaboost (AB), and random forest (RF) to predict the diabetic patients. Three types of partition protocols (K2, K5, and K10) have also adopted and repeated these protocols into 20 trails. Performances of these classifiers are evaluated using accuracy (ACC) and area under the curve (AUC).ResultsWe have used diabetes dataset, conducted in 2009–2012, derived from the National Health and Nutrition Examination Survey. The dataset consists of 6561 respondents with 657 diabetic and 5904 controls. LR model demonstrates that 7 factors out of 14 as age, education, BMI, systolic BP, diastolic BP, direct cholesterol, and total cholesterol are the risk factors for diabetes. The overall ACC of ML-based system is 90.62%. The combination of LR-based feature selection and RF-based classifier gives 94.25% ACC and 0.95 AUC for K10 protocol.ConclusionThe combination of LR and RF-based classifier performs better. This combination will be very helpful for predicting diabetic patients.

[1]  Jasjit S. Suri,et al.  Healthcare Text Classification System and its Performance Evaluation: A Source of Better Intelligence by Characterizing Healthcare Text , 2018, Journal of Medical Systems.

[2]  Wei Hu,et al.  AdaBoost-Based Algorithm for Network Intrusion Detection , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Ümit Budak,et al.  Transfer learning based histopathologic image classification for breast cancer detection , 2018, Health Information Science and Systems.

[4]  Assessment of knowledge related to diabetes mellitus among patients attending a dental college in Salem city-A cross sectional study , 2017 .

[5]  S. Joost,et al.  Editorial: The Least Cost Path From Landscape Genetics to Landscape Genomics: Challenges and Opportunities to Explore NGS Data in a Spatially Explicit Context , 2018, Front. Genet..

[6]  Ayman El-Baz,et al.  Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers , 2018, Journal of Medical Systems.

[7]  Usman Qamar,et al.  IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework , 2016, J. Biomed. Informatics.

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  D. Nathan,et al.  Long-term complications of diabetes mellitus. , 1993, The New England journal of medicine.

[10]  Deyu Li,et al.  A feature selection method based on improved fisher's discriminant ratio for text sentiment classification , 2011, Expert Syst. Appl..

[11]  Harman S. Suri,et al.  Risk factors of neonatal mortality and child mortality in Bangladesh , 2018, Journal of global health.

[12]  Geletaw Sahle Ethiopic maternal care data mining: discovering the factors that affect postnatal care visit in Ethiopia , 2016, Health Inf. Sci. Syst..

[13]  P. Zimmet,et al.  Diabetes mellitus statistics on prevalence and mortality: facts and fallacies , 2016, Nature Reviews Endocrinology.

[14]  Kenneth Sundaraj,et al.  PERFORMANCE ANALYSIS OF FEATURE SELECTION METHOD USING ANOVA FOR AUTOMATIC WHEEZE DETECTION , 2015 .

[15]  M. Ezzati,et al.  National, regional, and global trends in fasting plasma glucose and diabetes prevalence since 1980: systematic analysis of health examination surveys and epidemiological studies with 370 country-years and 2·7 million participants , 2011, The Lancet.

[16]  K. T. Mathew,et al.  Diagnosis of Diabetes Mellitus using Microwaves , 2007 .

[17]  Yanhui Guo,et al.  Comparative study of multiclass classification methods on light microscopic images for hepatic schistosomiasis fibrosis diagnosis , 2018, Health Information Science and Systems.

[18]  Quan Zou,et al.  Exploratory Predicting Protein Folding Model with Random Forest and Hybrid Features , 2014 .

[19]  Peter C Austin,et al.  Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. , 2004, Journal of clinical epidemiology.

[20]  Mihir Narayan Mohanty,et al.  Detection of Diabetes Using Multilayer Perceptron , 2019 .

[21]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[22]  Frauke Degenhardt,et al.  Evaluation of variable selection methods for random forests and omics data sets , 2017, Briefings Bioinform..

[23]  S. Frank,et al.  An Ensemble Classifier for Predicting the Onset of Type II Diabetes , 2017, 1708.07480.

[24]  Dilip Singh Sisodia,et al.  Prediction of Diabetes using Classification Algorithms , 2018 .

[25]  J. Danesh,et al.  Diabetes mellitus, fasting blood glucose concentration, and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies. , 2010, Lancet.

[26]  Ahmed Hamza Osman,et al.  A Novel Feature Selection Based on One-Way ANOVA F-Test for E-Mail Spam Classification , 2014 .

[27]  Jasjit S. Suri,et al.  Extreme Learning Machine Framework for Risk Stratification of Fatty Liver Disease Using Ultrasound Tissue Characterization , 2017, Journal of Medical Systems.

[28]  Chengpu Zhang,et al.  Identification of Potential Type II Diabetes in a Chinese Population with a Sensitive Decision Tree Approach , 2019, Journal of diabetes research.

[29]  Muin J. Khoury,et al.  Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes , 2010, BMC Medical Informatics Decis. Mak..

[30]  Ravinder Ahuja,et al.  Comparative Study of Various Machine Learning Algorithms for Prediction of Insomnia , 2019, Advances in Medical Technologies and Clinical Practice.

[31]  Ying Ju,et al.  Predicting Diabetes Mellitus With Machine Learning Techniques , 2018, Front. Genet..

[32]  Ayman El-Baz,et al.  Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm , 2017, Comput. Methods Programs Biomed..

[33]  A. Krasteva,et al.  Oral Cavity and Systemic Diseases—Diabetes Mellitus , 2011 .

[34]  U. Rajendra Acharya,et al.  AUTOMATIC IDENTIFICATION OF EPILEPTIC EEG SIGNALS USING NONLINEAR PARAMETERS , 2009 .

[35]  Petia Radeva,et al.  Wall-based measurement features provides an improved IVUS coronary artery risk assessment when fused with plaque texture-based features during machine learning paradigm , 2017, Comput. Biol. Medicine.

[36]  Petia Radeva,et al.  Calcium detection, its quantification, and grayscale morphology-based risk stratification using machine learning in multimodality big data coronary and carotid scans: A review , 2018, Comput. Biol. Medicine.

[37]  Gang Luo,et al.  PredicT-ML: a tool for automating machine learning model building with big clinical data , 2016, Health Information Science and Systems.

[38]  E. Iancu,et al.  Method for the analysing of blood glucose dynamics in diabetes mellitus patients , 2008, 2008 IEEE International Conference on Automation, Quality and Testing, Robotics.

[39]  David Hamilton,et al.  Blood Glucose Prediction Using Artificial Neural Networks Trained with the AIDA Diabetes Simulator: A Proof-of-Concept Pilot Study , 2011, J. Electr. Comput. Eng..

[40]  Gang Luo,et al.  MLBCD: a machine learning tool for big clinical data , 2015, Health Information Science and Systems.

[41]  Richard A. Bauder,et al.  The effects of varying class distribution on learner behavior for medicare fraud detection with imbalanced big data , 2018, Health Inf. Sci. Syst..

[42]  N. B. Venkateswarlu,et al.  A Critical Comparative Study of Liver Patients from USA and INDIA: An Exploratory Analysis , 2012 .

[43]  Gang Luo,et al.  Automatically explaining machine learning prediction results: a demonstration on type 2 diabetes risk prediction , 2016, Health Information Science and Systems.

[44]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[45]  Jasjit S. Suri,et al.  Computer-aided diagnosis of psoriasis skin images with HOS, texture and color features: A first comparative study of its kind , 2016, Comput. Methods Programs Biomed..

[46]  Mustafa Musa Jaber,et al.  Cloud based framework for diagnosis of diabetes mellitus using K-means clustering , 2018, Health Information Science and Systems.

[47]  Ying Ju,et al.  Prediction of G Protein-Coupled Receptors with SVM-Prot Features and Random Forest , 2016, Scientifica.

[48]  Setu Shah,et al.  Neural networks for mining the associations between diseases and symptoms in clinical notes , 2018, Health Inf. Sci. Syst..

[49]  Jasjit S. Suri,et al.  A novel and robust Bayesian approach for segmentation of psoriasis lesions and its risk stratification , 2017, Comput. Methods Programs Biomed..

[50]  Tran Quoc Bao,et al.  Trends in adult body-mass index in 200 countries from 1975 to 2014: a pooled analysis of 1698 population-based measurement studies with 19·2 million participants , 2016, The Lancet.

[51]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.