A model for early prediction of diabetes

Abstract Diabetes is a common, chronic disease. Prediction of diabetes at an early stage can lead to improved treatment. Data mining techniques are widely used for prediction of disease at an early stage. In this research paper, diabetes is predicted using significant attributes, and the relationship of the differing attributes is also characterized. Various tools are used to determine significant attribute selection, and for clustering, prediction, and association rule mining for diabetes. Significant attributes selection was done via the principal component analysis method. Our findings indicate a strong association of diabetes with body mass index (BMI) and with glucose level, which was extracted via the Apriori method. Artificial neural network (ANN), random forest (RF) and K-means clustering techniques were implemented for the prediction of diabetes. The ANN technique provided a best accuracy of 75.7%, and may be useful to assist medical professionals with treatment decisions.

[1]  Michael Bailey,et al.  Blood glucose concentration and outcome of critical illness: The impact of diabetes* , 2008, Critical care medicine.

[2]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[3]  Mehmet Ali Cengiz,et al.  The use of several information criteria for logistic regression model to investigate the effects of diabetic drugs on HbA1c levels , 2018 .

[4]  Shengqi Yang,et al.  Type 2 diabetes mellitus prediction model based on data mining , 2018 .

[5]  Amina Azrar,et al.  Data Mining Models Comparison for Diabetes Prediction , 2018 .

[6]  Organización Mundial de la Salud World health statistics 2017: monitoring health for the SDGs, Sustainable Development Goals , 2018 .

[7]  Mark I. Johnson,et al.  Abdominal obesity and metabolic syndrome: exercise as medicine? , 2018, BMC Sports Science, Medicine and Rehabilitation.

[8]  Asma Parveen,et al.  PREDICTION SYSTEM FOR HEART DISEASE USING NAIVE BAYES , 2012 .

[9]  Satoshi Sasaki,et al.  Family history of diabetes, lifestyle factors, and the 7‐year incident risk of type 2 diabetes mellitus in middle‐aged Japanese men and women , 2013, Journal of diabetes investigation.

[10]  Anne-Laure Boulesteix,et al.  Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012, WIREs Data Mining Knowl. Discov..

[11]  E. Bonifacio,et al.  Differentiation of Diabetes by Pathophysiology, Natural History, and Prognosis , 2016, Diabetes.

[12]  Ali Idri,et al.  Data Preprocessing for Decision Making in Medical Informatics: Potential and Analysis , 2018, WorldCIST.

[13]  Shitala Prasad,et al.  Classification of Diabetic Patient Data Using Machine Learning Techniques , 2018 .

[14]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[15]  Donna R. Falvo,et al.  Medical and Psychosocial Aspects of Chronic Illness and Disability , 1991 .

[16]  Muhammad Atif,et al.  Cervical Cancer Prediction through Different Screening Methods using Data Mining , 2019, International Journal of Advanced Computer Science and Applications.

[17]  Chee Peng Lim,et al.  A hybrid intelligent system for medical data classification , 2014, Expert Syst. Appl..

[18]  I. Goldberg,et al.  Novel biomarkers for prediabetes, diabetes, and associated complications , 2017, Diabetes, metabolic syndrome and obesity : targets and therapy.

[19]  Michael S. Radin,et al.  Pitfalls in Hemoglobin A1c Measurement: When Results may be Misleading , 2014, Journal of general internal medicine.

[20]  Varun Jaiswal,et al.  A first attempt to develop a diabetes prediction method based on different global datasets , 2016, 2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC).

[21]  Malka N. Halgamuge,et al.  Impact of Different Data Types on Classifier Performance of Random Forest, Naïve Bayes, and K-Nearest Neighbors Algorithms , 2017 .

[22]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Daniele Ramazzotti,et al.  Data Pre-processing , 2016 .

[24]  Bo Carlberg,et al.  Effect of antihypertensive treatment at different blood pressure levels in patients with diabetes mellitus: systematic review and meta-analyses , 2016 .

[25]  J. Shaw,et al.  IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. , 2018, Diabetes research and clinical practice.

[26]  Zhenfang XIA,et al.  Prevalence and Risk Factors of Type 2 Diabetes in the Adults in Haikou City, Hainan Island, China , 2013, Iranian journal of public health.

[27]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[28]  Keith F Rust,et al.  Associations Between Trends in Race/Ethnicity, Aging, and Body Mass Index With Diabetes Prevalence in the United States , 2014, Annals of Internal Medicine.

[29]  Jing Zhao,et al.  Epidemiological Perspectives of Diabetes , 2015, Cell Biochemistry and Biophysics.

[30]  Somula Ramasubbareddy,et al.  Classification of Heart Disease Using Support Vector Machine , 2019, Journal of Computational and Theoretical Nanoscience.

[31]  Deeraj Shetty,et al.  Diabetes disease prediction using data mining , 2017, 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS).

[32]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[33]  Anael Sam,et al.  Overview Applications of Data Mining In Health Care: The Case Study of Arusha Region , 2013 .

[34]  Amelia Ritahani Ismail,et al.  Performance Analysis Of Machine Learning Algorithms For Missing Value Imputation , 2018 .