Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran Lipid and Glucose Study.

AIMS The aim of this study was to create a prediction model using data mining approach to identify low risk individuals for incidence of type 2 diabetes, using the Tehran Lipid and Glucose Study (TLGS) database. METHODS For a 6647 population without diabetes, aged ≥20 years, followed for 12 years, a prediction model was developed using classification by the decision tree technique. Seven hundred and twenty-nine (11%) diabetes cases occurred during the follow-up. Predictor variables were selected from demographic characteristics, smoking status, medical and drug history and laboratory measures. RESULTS We developed the predictive models by decision tree using 60 input variables and one output variable. The overall classification accuracy was 90.5%, with 31.1% sensitivity, 97.9% specificity; and for the subjects without diabetes, precision and f-measure were 92% and 0.95, respectively. The identified variables included fasting plasma glucose, body mass index, triglycerides, mean arterial blood pressure, family history of diabetes, educational level and job status. CONCLUSIONS In conclusion, decision tree analysis, using routine demographic, clinical, anthropometric and laboratory measurements, created a simple tool to predict individuals at low risk for type 2 diabetes.

[1]  F. Azizi,et al.  Population-based incidence of Type 2 diabetes and its associated risk factors: results from a six-year cohort study in Iran , 2009, BMC public health.

[2]  Muin J. Khoury,et al.  Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes , 2010, BMC Medical Informatics Decis. Mak..

[3]  P Zimmet,et al.  International Diabetes Federation: a consensus on Type 2 diabetes prevention , 2007, Diabetic medicine : a journal of the British Diabetic Association.

[4]  J. Shaw,et al.  Follow-up report on the diagnosis of diabetes mellitus. , 2003, Diabetes care.

[5]  Maurizio Vichi,et al.  Studies in Classification Data Analysis and knowledge Organization , 2011 .

[6]  Jin Park,et al.  A sequential neural network model for diabetes prediction , 2001, Artif. Intell. Medicine.

[7]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[8]  J. Kammerer,et al.  Tuberculosis transmission in nontraditional settings: a decision-tree approach. , 2005, American journal of preventive medicine.

[9]  Cenk Sahin,et al.  Can Neural Network Able to Estimate the Prognosis of Epilepsy Patients Accorrding to Risk Factors? , 2010, Journal of Medical Systems.

[10]  The Emerging Risk Factors Collaboration Diabetes mellitus, fasting blood glucose concentration, and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies , 2010, The Lancet.

[11]  Joseph L. Breault,et al.  Data mining a diabetic data warehouse , 2002, Artif. Intell. Medicine.

[12]  Hsiang-Yang Chen,et al.  Exploring the risk factors of preterm birth using data mining , 2011, Expert Syst. Appl..

[13]  Amitava Banerjee,et al.  Tracking global funding for the prevention and control of noncommunicable diseases. , 2012, Bulletin of the World Health Organization.

[14]  Illhoi Yoo,et al.  Data-Mining Technologies for Diabetes: A Systematic Review , 2011, Journal of diabetes science and technology.

[15]  B. Gersh Relation between age and cardiovascular disease in men and women with diabetes compared with non-diabetic people: a population-based retrospective cohort study , 2007 .

[16]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[17]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[18]  Illhoi Yoo,et al.  Data Mining in Healthcare and Biomedicine: A Survey of the Literature , 2012, Journal of Medical Systems.

[19]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[20]  Murali S. Shanker,et al.  Using Neural Networks To Predict the Onset of Diabetes Mellitus , 1996, J. Chem. Inf. Comput. Sci..

[21]  Bernard C. Jiang,et al.  Using data mining techniques for multi-diseases prediction modeling of hypertension and hyperlipidemia by common risk factors , 2011, Expert Syst. Appl..

[22]  Nikolaos M. Avouris,et al.  EVALUATION OF CLASSIFIERS FOR AN UNEVEN CLASS DISTRIBUTION PROBLEM , 2006, Appl. Artif. Intell..

[23]  Daniel J. Licht,et al.  Prediction of periventricular leukomalacia. Part I: Selection of hemodynamic features using logistic regression and decision tree algorithms , 2009, Artif. Intell. Medicine.

[24]  S. Weller,et al.  Effectiveness of diabetes mellitus screening recommendations , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  J. Shaw,et al.  Global estimates of diabetes prevalence for 2013 and projections for 2035. , 2014, Diabetes Research and Clinical Practice.

[26]  Xuehui Meng,et al.  Comparison of three data mining models for predicting diabetes or prediabetes by risk factors , 2013, The Kaohsiung journal of medical sciences.

[27]  D. Khalili,et al.  Family history of diabetes modifies the effect of blood pressure for incident diabetes in Middle Eastern women: Tehran Lipid and Glucose Study , 2012, Journal of Human Hypertension.

[28]  A. Farmer,et al.  Working together to reduce poverty's damage , 1997, BMJ.

[29]  Farzad Hadaegh,et al.  Prevention of non-communicable disease in a population in nutrition transition: Tehran Lipid and Glucose Study phase II , 2009, Trials.

[30]  J. Tu,et al.  Relation between age and cardiovascular disease in men and women with diabetes compared with non-diabetic people: a population-based retrospective cohort study , 2006, The Lancet.

[31]  Karel G M Moons,et al.  Prediction models for risk of developing type 2 diabetes: systematic literature search and independent external validation study , 2012, BMJ : British Medical Journal.

[32]  Chao-Ton Su,et al.  Data mining for the diagnosis of type II diabetes from three-dimensional body surface anthropometrical scanning data , 2006, Comput. Math. Appl..

[33]  Eta S. Berner,et al.  Clinical Decision Support Systems , 1999, Health Informatics.

[34]  Kristen E. DiCerbo,et al.  Exploratory Data Analysis , 2003 .

[35]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[36]  Nada Lavrac,et al.  Selected techniques for data mining in medicine , 1999, Artif. Intell. Medicine.

[37]  Aleksander Mendyk,et al.  Artificial intelligence technology as a tool for initial GDM screening , 2004, Expert Syst. Appl..

[38]  Fereidoun Azizi,et al.  Cardiovascular risk factors in an Iranian urban population: Tehran Lipid and Glucose Study (Phase 1) , 2002, Sozial- und Präventivmedizin.

[39]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[40]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[41]  P. Raskin,et al.  Report of the expert committee on the diagnosis and classification of diabetes mellitus. , 1999, Diabetes care.

[42]  Liang-ping Hu,et al.  Performance comparison between Logistic regression, decision trees, and multilayer perceptron in predicting peripheral neuropathy in type 2 diabetes mellitus. , 2012, Chinese medical journal.

[43]  P. Savage,et al.  Cardiovascular disease in older adults with glucose disorders: comparison of American Diabetes Association criteria for diabetes mellitus with WHO criteria , 1999, The Lancet.

[44]  A. Ghasemi,et al.  High prevalence of undiagnosed diabetes and abnormal glucose tolerance in the Iranian urban population: Tehran Lipid and Glucose Study , 2008, BMC public health.

[45]  Vili Podgorelec,et al.  Decision Trees: An Overview and Their Use in Medicine , 2002, Journal of Medical Systems.

[46]  Trisha Greenhalgh,et al.  Risk models and scores for type 2 diabetes: systematic review , 2011, BMJ : British Medical Journal.