Electronic health record phenotyping improves detection and screening of type 2 diabetes in the general United States population: A cross-sectional, unselected, retrospective study

OBJECTIVES An estimated 25% of type two diabetes mellitus (DM2) patients in the United States are undiagnosed due to inadequate screening, because it is prohibitive to administer laboratory tests to everyone. We assess whether electronic health record (EHR) phenotyping could improve DM2 screening compared to conventional models, even when records are incomplete and not recorded systematically across patients and practice locations, as is typically seen in practice. METHODS In this cross-sectional, retrospective study, EHR data from 9948 US patients were used to develop a pre-screening tool to predict current DM2, using multivariate logistic regression and a random-forests probabilistic model for out-of-sample validation. We compared (1) a full EHR model containing commonly prescribed medications, diagnoses (as ICD9 categories), and conventional predictors, (2) a restricted EHR DX model which excluded medications, and (3) a conventional model containing basic predictors and their interactions (BMI, age, sex, smoking status, hypertension). RESULTS Using a patient's full EHR or restricted EHR was superior to using basic covariates alone for detecting individuals with diabetes (hierarchical X(2) test, p<0.001). Migraines, depot medroxyprogesterone acetate, and cardiac dysrhythmias were associated negatively with DM2, while sexual and gender identity disorder diagnosis, viral and chlamydial infections, and herpes zoster were associated positively. Adding EHR phenotypes improved classification; the AUC for the full EHR Model, EHR DX model, and conventional model using logistic regression, were 84.9%, 83.2%, and 75.0% respectively. For random forest machine learning out-of-sample prediction, accuracy also was improved when using EHR phenotypes; the AUC values were 81.3%, 79.6%, and 74.8%, respectively. Improved AUCs reflect better performance for most thresholds that balance sensitivity and specificity. CONCLUSIONS EHR phenotyping resulted in markedly superior detection of DM2, even in the face of missing and unsystematically recorded data, based on the ROC curves. EHR phenotypes could more efficiently identify which patients do require, and don't require, further laboratory screening. When applied to the current number of undiagnosed individuals in the United States, we predict that incorporating EHR phenotype screening would identify an additional 400,000 patients with active, untreated diabetes compared to the conventional pre-screening models.

[1]  Jaakko Tuomilehto,et al.  The diabetes risk score: a practical tool to predict type 2 diabetes risk. , 2003, Diabetes care.

[2]  J. Lindström,et al.  Tools for Predicting the Risk of Type 2 Diabetes in Daily Practice , 2008, Hormone and metabolic research = Hormon- und Stoffwechselforschung = Hormones et metabolisme.

[3]  R. Gabbay,et al.  Back to Wilson and Jungner: 10 good reasons to screen for type 2 diabetes mellitus. , 2009, Mayo Clinic proceedings.

[4]  G. Holmes,et al.  Lateral Asymmetry in Activation of Hypothalamic Neurons with Unilateral Amygdaloid Seizures , 1997, Epilepsia.

[5]  Dang Qing,et al.  Effects of Diet and Exercise in Preventing NIDDM in People With Impaired Glucose Tolerance The , 2022 .

[6]  M. Fiatarone,et al.  The etiology and reversibility of muscle dysfunction in the aged. , 1993, Journal of gerontology.

[7]  M. Harris,et al.  Undiagnosed NIDDM: Clinical and Public Health Issues , 1993, Diabetes Care.

[8]  N. Wareham,et al.  Diabetes risk score: towards earlier detection of Type 2 diabetes in general practice , 2000, Diabetes/metabolism research and reviews.

[9]  J Tuomilehto,et al.  Diabetes risk score in Oman: a tool to identify prevalent type 2 diabetes among Arabs of the Middle East. , 2007, Diabetes research and clinical practice.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Aaron Trefler,et al.  The Future of Medical Diagnostics: Large Digitized Databases , 2012, The Yale journal of biology and medicine.

[12]  Koen J. F. Verhoeven,et al.  Implementing false discovery rate control: increasing your power , 2005 .

[13]  Ja Wilson,et al.  Principles and practice of screening for disease , 1968 .

[14]  Jennifer G. Robinson,et al.  Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[15]  M. Reed,et al.  Implementation of an outpatient electronic health record and emergency department visits, hospitalizations, and office visits among patients with diabetes. , 2013, JAMA.

[16]  J. Hux,et al.  The role of ethnicity in predicting diabetes risk at the population level , 2012, Ethnicity & health.

[17]  B. Thorsteinsson,et al.  Varenicline may trigger severe hypoglycaemia in Type 1 diabetes , 2008, Diabetic medicine : a journal of the British Diabetic Association.

[18]  Søren Brunak,et al.  Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts , 2011, PLoS Comput. Biol..

[19]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[20]  B. Howard,et al.  Effects of Diet and Exercise in Preventing NIDDM in People With Impaired Glucose Tolerance: The Da Qing IGT and Diabetes Study , 1997, Diabetes Care.

[21]  A. Abernethy,et al.  Importance of health information technology, electronic health records, and continuously aggregating data to comparative effectiveness research and learning health care. , 2012, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[22]  T. Valle,et al.  Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. , 2001, The New England journal of medicine.

[23]  Rochelle Brooks,et al.  Implementation Of Electronic Medical Records: How Healthcare Providers Are Managing The Challenges Of Going Digital , 2010 .

[24]  Plamen Nikolov,et al.  Economic Costs of Diabetes in the U.S. in 2002 , 2003, Diabetes care.

[25]  Andrew Masica,et al.  Leveraging Electronic Health Records in Comparative Effectiveness Research , 2012 .

[26]  K. Lin,et al.  Screening for type 2 diabetes mellitus in adults. , 2009, American family physician.

[27]  G. Kegels,et al.  The performance of the Finnish Diabetes Risk Score, a modified Finnish Diabetes Risk Score and a simplified Finnish Diabetes Risk Score in community-based cross-sectional screening of undiagnosed type 2 diabetes in the Philippines. , 2013, Primary care diabetes.

[28]  S. Wannamethee,et al.  Smoking as a modifiable risk factor for type 2 diabetes in middle-aged men. , 2001, Diabetes care.

[29]  Benjamin S. Aribisala,et al.  Reversal of type 2 diabetes: normalisation of beta cell function in association with decreased pancreas and liver triacylglycerol , 2011, Diabetologia.

[30]  E. Ewen,et al.  Electronic health record use to classify patients with newly diagnosed versus preexisting type 2 diabetes: infrastructure for comparative effectiveness research and population health management. , 2012, Population health management.

[31]  Melissa A. Basford,et al.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[32]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[33]  Aziz Sheikh,et al.  Predicting risk of type 2 diabetes in England and Wales: prospective derivation and validation of QDScore , 2009, BMJ : British Medical Journal.

[34]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[35]  M. Cassader,et al.  An in vivo and in vitro study of the mechanism of prednisone-induced insulin resistance in healthy subjects. , 1983, The Journal of clinical investigation.

[36]  R. Lipton,et al.  Comorbidity of migraine , 2005, Current opinion in neurology.

[37]  Nathan Pike,et al.  Using false discovery rates for multiple comparisons in ecology and evolution , 2011 .

[38]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[39]  Claude Lenfant,et al.  Definition of Metabolic Syndrome: Report of the National Heart, Lung, and Blood Institute/American Heart Association Conference on Scientific Issues Related to Definition , 2004, Arteriosclerosis, thrombosis, and vascular biology.

[40]  E. Perez-stable,et al.  The effects of ethnicity and language on medical outcomes of patients with hypertension or diabetes. , 1997, Medical care.

[41]  J. Pickup Inflammation and activated innate immunity in the pathogenesis of type 2 diabetes. , 2004, Diabetes care.

[42]  W. Katon,et al.  Risk score for prediction of 10 year dementia risk in individuals with type 2 diabetes: a cohort study. , 2013, The lancet. Diabetes & endocrinology.

[43]  D. Bowen,et al.  Overweight and obesity in sexual-minority women: evidence from population-based data. , 2007, American journal of public health.

[44]  B. Yawn,et al.  Screening for type 2 diabetes mellitus in adults: U.S. Preventive Services Task Force recommendation statement. , 2008, Annals of internal medicine.

[45]  Young-Taek Park,et al.  The impact of electronic health records on people with diabetes in three different emergency departments. , 2014, Journal of the American Medical Informatics Association : JAMIA.

[46]  J. Bottomley,et al.  Economic costs of diabetes in the US in 2007 — Implications for Europe , 2008 .

[47]  K. Conron,et al.  A population-based study of sexual orientation identity and gender differences in adult health. , 2010, American journal of public health.

[48]  C. van Weel,et al.  Identifying people at risk for undiagnosed type 2 diabetes using the GP's electronic medical record. , 2007, Family practice.

[49]  S. Schneeweiss Learning from big health care data. , 2014, The New England journal of medicine.

[50]  Nicola J Cooper,et al.  Different strategies for screening and prevention of type 2 diabetes in adults: cost effectiveness analysis , 2008, BMJ : British Medical Journal.

[51]  K. Jeejeebhoy,et al.  Muscle function and nutrition. , 1986, Gut.

[52]  Joshua C. Denny,et al.  Type 2 Diabetes Risk Forecasting from EMR Data using Machine Learning , 2012, AMIA.

[53]  J. Elmquist,et al.  Lighting up the hypothalamus: coordinated control of feeding behavior , 2011, Nature Neuroscience.

[54]  George Hripcsak,et al.  Caveats for the use of operational electronic health record data in comparative effectiveness research. , 2013, Medical care.

[55]  Charles Kooperberg,et al.  Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results From the Women's Health Initiative randomized controlled trial. , 2002, JAMA.

[56]  Justin Feigelman,et al.  Age at initiation and frequency of screening to detect type 2 diabetes: a cost-effectiveness analysis , 2010, The Lancet.

[57]  R. Cebul,et al.  Electronic health records and quality of diabetes care. , 2011, The New England journal of medicine.

[58]  D A Bloch,et al.  Risk factors for physical disability in an aging cohort: the NHANES I Epidemiologic Followup Study. , 1993, The Journal of rheumatology.