Screening diabetes mellitus 2 based on electronic health records using temporal features

The prevalence of type 2 diabetes mellitus is increasing worldwide. Current methods of treating diabetes remain inadequate, and therefore, prevention with screening methods is the most appropriate process to reduce the burden of diabetes and its complications. We propose a new prognostic approach for type 2 diabetes mellitus based on electronic health records without using the current invasive techniques that are related to the disease (e.g. glucose level or glycated hemoglobin (HbA1c)). Our methodology is based on machine learning frameworks with data enrichment using temporal features. As as result our predictive model achieved an area under the receiver operating characteristics curve with a random forest classifier of 84.22 percent when including data information from 2009 to 2011 to predict diabetic patients in 2012, 83.19 percent when including temporal features, and 83.72 percent after applying temporal features and feature selection. We conclude that he pathology prediction is possible and efficient using the patient’s progression information over the years and without using the invasive techniques that are currently used for type 2 diabetes mellitus classification.

[1]  S. Lurie Mikhail Bulgako v's myth about medicine, literature, and fiction , 1999, The Lancet.

[2]  H. Lithell,et al.  Will new diagnostic criteria for diabetes mellitus change phenotype of patients with diabetes? Reanalysis of European epidemiological data , 1998 .

[3]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[4]  G. Williams,et al.  Handbook of Diabetes , 1999 .

[5]  K. Narayan,et al.  Rate of Weight Gain, Weight Fluctuation, and Incidence of NIDDM , 1995, Diabetes.

[6]  J. Shaw,et al.  Global estimates of the prevalence of diabetes for 2010 and 2030. , 2010, Diabetes research and clinical practice.

[7]  G. Colditz,et al.  Weight Gain as a Risk Factor for Clinical Diabetes Mellitus in Women , 1995, Annals of Internal Medicine.

[8]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[9]  M. Harris,et al.  Early detection of undiagnosed diabetes mellitus: a US perspective , 2000, Diabetes/metabolism research and reviews.

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  Andreas Holzinger,et al.  Towards Personalization of Diabetes Therapy Using Computerized Decision Support and Machine Learning: Some Open Problems and Challenges , 2015, Smart Health.

[12]  Igor Jurisica,et al.  Knowledge Discovery and interactive Data Mining in Bioinformatics - State-of-the-Art, future challenges and research directions , 2014, BMC Bioinformatics.

[13]  Richard Donnelly,et al.  Handbook of Diabetes: Bilous/Handbook of Diabetes , 2010 .

[14]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[15]  S. Fowler,et al.  Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. , 2002 .

[16]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[17]  Ukpds,et al.  Intensive blood-glucose control with sulphonylureas or insulin compared with conventional treatment and risk of complications in patients with type 2 diabetes , 2002 .

[18]  W. Kannel,et al.  Diabetes and cardiovascular disease. The Framingham study. , 1979, JAMA.

[19]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  T. Valle,et al.  Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. , 2001, The New England journal of medicine.

[22]  Jaakko Tuomilehto,et al.  The diabetes risk score: a practical tool to predict type 2 diabetes risk. , 2003, Diabetes care.

[23]  B. Balkau Screening for Diabetes , 2008, Diabetes Care.

[24]  G A Colditz,et al.  Obesity, Fat Distribution, and Weight Gain as Risk Factors for Clinical Diabetes in Men , 1994, Diabetes Care.

[25]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[26]  N. Terry,et al.  The Emergence of National Electronic Health Record Architectures in the United States and Australia: Models, Costs, and Questions , 2005, Journal of medical Internet research.

[27]  S. Grandy,et al.  The relationship of body mass index to diabetes mellitus, hypertension and dyslipidaemia: comparison of data from two national surveys , 2007, International journal of clinical practice.

[28]  B. Howard,et al.  Effects of Diet and Exercise in Preventing NIDDM in People With Impaired Glucose Tolerance: The Da Qing IGT and Diabetes Study , 1997, Diabetes Care.

[29]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[30]  John S Yudkin,et al.  Association of systolic blood pressure with macrovascular and microvascular complications of type 2 diabetes (UKPDS 36): prospective observational study , 2000, BMJ : British Medical Journal.

[31]  C. Dolea,et al.  World Health Organization , 1949, International Organization.

[32]  M. Pendergrass,et al.  Diabetes in Clinical Practice , 2009 .

[33]  M. Petticrew Diagoras of Melos (500 BC): an early analyst of publication bias , 1998, The Lancet.

[34]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[35]  A. Nussey ABC of diabetes. , 1982, British medical journal.