Comparison of three data mining models for predicting diabetes or prediabetes by risk factors

The purpose of this study was to compare the performance of logistic regression, artificial neural networks (ANNs) and decision tree models for predicting diabetes or prediabetes using common risk factors. Participants came from two communities in Guangzhou, China; 735 patients confirmed to have diabetes or prediabetes and 752 normal controls were recruited. A standard questionnaire was administered to obtain information on demographic characteristics, family diabetes history, anthropometric measurements and lifestyle risk factors. Then we developed three predictive models using 12 input variables and one output variable from the questionnaire information; we evaluated the three models in terms of their accuracy, sensitivity and specificity. The logistic regression model achieved a classification accuracy of 76.13% with a sensitivity of 79.59% and a specificity of 72.74%. The ANN model reached a classification accuracy of 73.23% with a sensitivity of 82.18% and a specificity of 64.49%; and the decision tree (C5.0) achieved a classification accuracy of 77.87% with a sensitivity of 80.68% and specificity of 75.13%. The decision tree model (C5.0) had the best classification accuracy, followed by the logistic regression model, and the ANN gave the lowest accuracy.

[1]  Yong-Moo Suh,et al.  Copyright � The Korean Academy of Medical Sciences Comparison of Hospital Charge Prediction Models for Colorectal Cancer Patients: Neural Network vs. Decision Tree Models , 2004 .

[2]  Heiner Boeing,et al.  Comparison of Anthropometric Characteristics in Predicting the Incidence of Type 2 Diabetes in the EPIC-Potsdam Study A table elsewhere in this issue shows conventional and Système International (SI) units and conversion factors for many substances. , 2006, Diabetes Care.

[3]  L. Lissner,et al.  Adipocyte size predicts incidence of type 2 diabetes in women , 2010, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[4]  Sirkka Keinänen-Kiukaanniemi,et al.  Lifestyle Intervention for Prevention of Type 2 Diabetes in Primary Health Care , 2010, Diabetes Care.

[5]  Jing Wang,et al.  Comparison of hospital charge prediction models for gastric cancer patients: neural network vs. decision tree models , 2009, BMC health services research.

[6]  P. H. Sönksen,et al.  Data mining for indicators of early mortality in a database of clinical records , 2001, Artif. Intell. Medicine.

[7]  Bernard C. Jiang,et al.  Using data mining techniques for multi-diseases prediction modeling of hypertension and hyperlipidemia by common risk factors , 2011, Expert Syst. Appl..

[8]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[9]  Andrew G Rundle,et al.  Sleep duration as a risk factor for diabetes incidence in a large U.S. sample. , 2007, Sleep.

[10]  N. Hammar,et al.  Alcohol consumption and type 2 diabetes Meta-analysis of epidemiological studies indicates a U-shaped relationship. , 2005, Diabetologia.

[11]  G A Colditz,et al.  Weight as a risk factor for clinical diabetes in women. , 1990, American journal of epidemiology.

[12]  James W. Anderson,et al.  Carbohydrate and Fiber Recommendations for Individuals with Diabetes: A Quantitative Assessment and Meta-Analysis of the Evidence , 2004, Journal of the American College of Nutrition.

[13]  L. Jørgensen,et al.  Smoking is a strong risk factor for non-vertebral fractures in women with diabetes: the Tromsø Study , 2011, Osteoporosis International.

[14]  T. Valle,et al.  Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. , 2001, The New England journal of medicine.

[15]  Bo Zhang,et al.  The long-term effect of lifestyle interventions to prevent diabetes in the China Da Qing Diabetes Prevention Study: a 20-year follow-up study , 2008, The Lancet.

[16]  K. Dou,et al.  Prevalence of diabetes among men and women in China. , 2010, The New England journal of medicine.

[17]  Xu Lin,et al.  Associations of alcohol consumption with diabetes mellitus and impaired fasting glycemia among middle-aged and elderly Chinese , 2010 .

[18]  E. Rimm,et al.  Association Between Passive and Active Smoking and Incident Type 2 Diabetes in Women , 2011, Diabetes Care.

[19]  J. Cornuz,et al.  Active smoking and the risk of type 2 diabetes: a systematic review and meta-analysis. , 2007, JAMA.

[20]  L. Bouter,et al.  Moderate alcohol consumption lowers the risk of type 2 diabetes: a meta-analysis of prospective observational studies. , 2005, Diabetes care.

[21]  Dong Ha Lee,et al.  Data mining approach to policy analysis in a health insurance domain , 2001, Int. J. Medical Informatics.

[22]  J. Shaw,et al.  BMI Compared With Central Obesity Indicators as a Predictor of Diabetes Incidence in Mauritius , 2009, Obesity.

[23]  Anson,et al.  DIET , LIFESTYLE , AND THE RISK OF TYPE 2 DIABETES MELLITUS IN WOMEN , 2001 .

[24]  X. Pan,et al.  Prevalence of Diabetes and Its Risk Factors in China, 1994 , 1997, Diabetes Care.

[25]  Nada Lavrac,et al.  Selected techniques for data mining in medicine , 1999, Artif. Intell. Medicine.

[26]  Ting-Ting Lee,et al.  Application of data mining to the identification of critical factors in patient falls using a web-based reporting system , 2011, Int. J. Medical Informatics.

[27]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[28]  Tai-Hsi Wu,et al.  Using data mining techniques to predict hospitalization of hemodialysis patients , 2011, Decis. Support Syst..

[29]  JoAnn E Manson,et al.  Epidemiological evidence for the role of physical activity in reducing risk of type 2 diabetes and cardiovascular disease. , 2005, Journal of applied physiology.

[30]  J. Thakur,et al.  Prevalence and risk factors of diabetes in a community-based study in North India: the Chandigarh Urban Diabetes Study (CUDS). , 2011, Diabetes & metabolism.

[31]  Walter C Willett,et al.  Comparison of abdominal adiposity and overall obesity in predicting risk of type 2 diabetes among men. , 2005, The American journal of clinical nutrition.

[32]  J. Manson,et al.  Dietary fat intake and risk of type 2 diabetes in women. , 2001, The American journal of clinical nutrition.

[33]  Vili Podgorelec,et al.  Finding the right decision tree's induction strategy for a hard real world problem , 2001, Int. J. Medical Informatics.

[34]  K. Reynolds,et al.  Prevalence of diabetes and impaired fasting glucose in the Chinese adult population: International Collaborative Study of Cardiovascular Disease in Asia (InterASIA) , 2003, Diabetologia.

[35]  Suk-Hoon Chung,et al.  Prediction of Hospital Charges for the Cancer Patients with Data Mining Techniques , 2009 .

[36]  Chi-Chen Shih,et al.  Sleep duration is a potential risk factor for newly diagnosed type 2 diabetes mellitus. , 2011, Metabolism: clinical and experimental.

[37]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[38]  Hsiang-Yang Chen,et al.  Exploring the risk factors of preterm birth using data mining , 2011, Expert Syst. Appl..

[39]  K. Fang,et al.  Identification and validation of predictive factors for glycemic control: neural networks vs. logistic regression , 2007 .

[40]  E. Cerasi,et al.  DIABETES MELLITUS , 1924, Nihon rinsho. Japanese journal of clinical medicine.

[41]  Yong Zhou,et al.  Comparison of different anthropometric measures as predictors of diabetes incidence in a Chinese population. , 2011, Diabetes research and clinical practice.

[42]  R. Paffenbarger,et al.  Physical activity and reduced occurrence of non-insulin-dependent diabetes mellitus. , 1991, The New England journal of medicine.

[43]  J. Neel Diabetes mellitus: a "thrifty" genotype rendered detrimental by "progress"? , 1962, American journal of human genetics.

[44]  Cenk Sahin,et al.  Can Neural Network Able to Estimate the Prognosis of Epilepsy Patients Accorrding to Risk Factors? , 2010, Journal of Medical Systems.