Screening for Prediabetes Using Machine Learning Models

The global prevalence of diabetes is rapidly increasing. Studies support the necessity of screening and interventions for prediabetes, which could result in serious complications and diabetes. This study aimed at developing an intelligence-based screening model for prediabetes. Data from the Korean National Health and Nutrition Examination Survey (KNHANES) were used, excluding subjects with diabetes. The KNHANES 2010 data (n = 4685) were used for training and internal validation, while data from KNHANES 2011 (n = 4566) were used for external validation. We developed two models to screen for prediabetes using an artificial neural network (ANN) and support vector machine (SVM) and performed a systematic evaluation of the models using internal and external validation. We compared the performance of our models with that of a screening score model based on logistic regression analysis for prediabetes that had been developed previously. The SVM model showed the areas under the curve of 0.731 in the external datasets, which is higher than those of the ANN model (0.729) and the screening score model (0.712), respectively. The prescreening methods developed in this study performed better than the screening score model that had been developed previously and may be more effective method for prediabetes screening.

[1]  Wolfgang Rathmann,et al.  Prediabetes: a high-risk state for diabetes development , 2012, The Lancet.

[2]  K. Choi,et al.  Prevalence of diabetes and impaired fasting glucose in Korea: Korean National Health and Nutrition Survey 2001. , 2006, Diabetes care.

[3]  B. Reiser,et al.  Estimation of the Youden Index and its Associated Cutoff Point , 2005, Biometrical journal. Biometrische Zeitschrift.

[4]  J. Shaw,et al.  Global and societal implications of the diabetes epidemic , 2001, Nature.

[5]  W. Baxt Application of artificial neural networks to clinical medicine , 1995, The Lancet.

[6]  T. Schacker,et al.  Clinical and Epidemiologic Features of Primary HIV Infection , 1996, Annals of Internal Medicine.

[7]  H. Bang,et al.  A patient self-assessment diabetes screening score:: development, validation, and comparison to other diabetes risk assessment scores , 2009 .

[8]  Yi Han,et al.  Overview of Artificial Neural Networks , 2009, Artificial Neural Networks.

[9]  K. Matthews,et al.  Improving the performance of physiologic hot flash measures with support vector machines. , 2009, Psychophysiology.

[10]  Enzo Grossi,et al.  Recognition of Morphometric Vertebral Fractures by Artificial Neural Networks: Analysis from GISMO Lombardia Database , 2011, PloS one.

[11]  S. Yusuf,et al.  The relationship between glucose and incident cardiovascular events. A metaregression analysis of published data from 20 studies of 95,783 individuals followed for 12.4 years. , 1999, Diabetes care.

[12]  C. Herder,et al.  The Association of Genetic Markers for Type 2 Diabetes with Prediabetic Status - Cross-Sectional Data of a Diabetes Prevention Trial , 2013, PloS one.

[13]  N. Wareham,et al.  Diabetes risk score: towards earlier detection of Type 2 diabetes in general practice , 2000, Diabetes/metabolism research and reviews.

[14]  Yong Zhao,et al.  Concurrent Subspace Width Optimization Method for RBF Neural Network Modeling , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  Kurt Hoffmann,et al.  An Accurate Risk Score Based on Anthropometric, Dietary, and Lifestyle Factors to Predict the Development of Type 2 Diabetes , 2007, Diabetes Care.

[17]  Abbas Heiat,et al.  Comparison of artificial neural network and regression models for estimating software development effort , 2002, Inf. Softw. Technol..

[18]  R. Jennrich,et al.  Blood Glucose: a Strong Risk Factor for Mortality in Nondiabetic Patients with Cardiovascular Disease Acute Ischemic Heart Disease , 2022 .

[19]  Marylyn D. Ritchie,et al.  Neural networks for genetic epidemiology: past, present, and future , 2008, BioData Mining.

[20]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[21]  Jaakko Tuomilehto,et al.  The diabetes risk score: a practical tool to predict type 2 diabetes risk. , 2003, Diabetes care.

[22]  J. Choi,et al.  Osteoporosis Risk Prediction for Bone Mineral Density Assessment of Postmenopausal Women Using Machine Learning , 2013, Yonsei medical journal.

[23]  Stephen W. Sorensen,et al.  The Cost-Effectiveness of Lifestyle Modification or Metformin in Preventing Type 2 Diabetes in Adults with Impaired Glucose Tolerance , 2005, Annals of Internal Medicine.

[24]  Heejung Bang,et al.  Development and Validation of a Patient Self-assessment Score for Diabetes Risk , 2009, Annals of Internal Medicine.

[25]  D. Schillinger,et al.  Racial/ethnic variation in prevalence estimates for United States prediabetes under alternative 2010 American Diabetes Association criteria: 1988-2008. , 2012, Ethnicity & disease.

[26]  Michael M. Engelgau,et al.  Prevalence of Diabetes and Impaired Fasting Glucose in Adults in the U.S. Population , 2006, Diabetes Care.

[27]  Jun Zhu,et al.  Prediabetes and Short-Term Outcomes in Nondiabetic Patients after Acute ST-Elevation Myocardial Infarction , 2013, Cardiology.

[28]  Chung-Ho Hsieh,et al.  Novel solutions for an old disease: diagnosis of acute appendicitis with random forest, support vector machines, and artificial neural networks. , 2011, Surgery.

[29]  H. Bang,et al.  A Simple Screening Score for Diabetes for the Korean Population , 2012, Diabetes Care.

[30]  A. Akobeng,et al.  Understanding diagnostic tests 3: receiver operating characteristic curves , 2007, Acta paediatrica.

[31]  J. Shaw,et al.  Global estimates of the prevalence of diabetes for 2010 and 2030. , 2010, Diabetes research and clinical practice.

[32]  J M Dekker,et al.  Relation of impaired fasting and postload glucose with incident type 2 diabetes in a Dutch population: The Hoorn Study. , 2001, JAMA.

[33]  J. Shaw,et al.  IDF diabetes atlas: global estimates of the prevalence of diabetes for 2011 and 2030. , 2011, Diabetes research and clinical practice.

[34]  Kyung-Ah Kim,et al.  Mortality prediction of rats in acute hemorrhagic shock using machine learning techniques , 2013, Medical & Biological Engineering & Computing.

[35]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[36]  The Diabetes Prevention Program (DPP): description of lifestyle intervention. , 2002, Diabetes care.

[37]  Jongoh Kim,et al.  Prevalence and Management of Diabetes in Korean Adults , 2009, Diabetes Care.

[38]  K. Jung-Choi,et al.  The impact of governmental antismoking policy on socioeconomic disparities in cigarette smoking in South Korea. , 2009, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[39]  E. Feskens,et al.  Performance of a predictive model to identify undiagnosed diabetes in a health care setting. , 1999, Diabetes care.

[40]  Xuehui Meng,et al.  Comparison of three data mining models for predicting diabetes or prediabetes by risk factors , 2013, The Kaohsiung journal of medical sciences.