Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study

OBJECTIVE Genome-wide association studies (GWAS) require high specificity and large numbers of subjects to identify genotype-phenotype correlations accurately. The aim of this study was to identify type 2 diabetes (T2D) cases and controls for a GWAS, using data captured through routine clinical care across five institutions using different electronic medical record (EMR) systems. MATERIALS AND METHODS An algorithm was developed to identify T2D cases and controls based on a combination of diagnoses, medications, and laboratory results. The performance of the algorithm was validated at three of the five participating institutions compared against clinician review. A GWAS was subsequently performed using cases and controls identified by the algorithm, with samples pooled across all five institutions. RESULTS The algorithm achieved 98% and 100% positive predictive values for the identification of diabetic cases and controls, respectively, as compared against clinician review. By standardizing and applying the algorithm across institutions, 3353 cases and 3352 controls were identified. Subsequent GWAS using data from five institutions replicated the TCF7L2 gene variant (rs7903146) previously associated with T2D. DISCUSSION By applying stringent criteria to EMR data collected through routine clinical care, cases and controls for a GWAS were identified that subsequently replicated a known genetic variant. The use of standard terminologies to define data elements enabled pooling of subjects and data across five different institutions to achieve the robust numbers required for GWAS. CONCLUSIONS An algorithm using commonly available data from five different EMR can accurately identify T2D cases and controls for genetic study across multiple institutions.

[1]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[2]  Frank A Sloan,et al.  The growing burden of diabetes mellitus in the US elderly population. , 2008, Archives of internal medicine.

[3]  M. McCarthy,et al.  Replication of Genome-Wide Association Signals in UK Samples Reveals Risk Loci for Type 2 Diabetes , 2007, Science.

[4]  J. Gulcher,et al.  A variant in CDKAL1 influences insulin response and risk of type 2 diabetes , 2007, Nature Genetics.

[5]  S. Gough,et al.  Validation of an algorithm combining haemoglobin A1c and fasting plasma glucose for diagnosis of diabetes mellitus in UK and Australian populations , 2009, Diabetic medicine : a journal of the British Diabetic Association.

[6]  Melissa A. Basford,et al.  Identification of Genomic Predictors of Atrioventricular Conduction: Using Electronic Medical Records as a Tool for Genome Science , 2010, Circulation.

[7]  Alexander Turchin,et al.  Identification of patients with diabetes from the text of physician notes in the electronic medical record. , 2005, Diabetes care.

[8]  Richard L Berg,et al.  Use of an Electronic Medical Record for the Identification of Research Subjects with Diabetes Mellitus , 2007, Clinical Medicine & Research.

[9]  Wendy A. Wolf,et al.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies , 2011, BMC Medical Genomics.

[10]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[11]  E. Clayton,et al.  Principles of Human Subjects Protections Applied in an Opt‐Out, De‐identified Biobank , 2010, Clinical and translational science.

[12]  C. van Weel,et al.  Identifying people at risk for undiagnosed type 2 diabetes using the GP's electronic medical record. , 2007, Family practice.

[13]  Christopher G. Chute,et al.  An analytical approach to characterize morbidity profile dissimilarity between distinct cohorts using electronic medical records , 2010, J. Biomed. Informatics.

[14]  Philippe Froguel,et al.  TCF7L2 is reproducibly associated with type 2 diabetes in various ethnic groups: a global meta-analysis , 2007, Journal of Molecular Medicine.

[15]  Mark I McCarthy,et al.  Genomics, type 2 diabetes, and obesity. , 2010, The New England journal of medicine.

[16]  Rongling Li,et al.  Quality Control Procedures for Genome‐Wide Association Studies , 2011, Current protocols in human genetics.

[17]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[18]  G. Abecasis,et al.  A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants , 2007, Science.

[19]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[20]  D. Blumenthal Stimulating the adoption of health information technology. , 2009, The West Virginia medical journal.

[21]  Michael J. Pencina,et al.  Trends in the Incidence of Type 2 Diabetes Mellitus From the 1970s to the 1990s: The Framingham Heart Study , 2006, Circulation.

[22]  C. Chute,et al.  Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium , 2011, Science Translational Medicine.

[23]  John A. Todd,et al.  Genetics of Type 1 Diabetes: What's Next? , 2010, Diabetes.

[24]  Marcia M. Nizzari,et al.  Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels , 2007, Science.

[25]  Melissa A. Basford,et al.  Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. , 2010, American journal of human genetics.

[26]  Jin Fan,et al.  Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease , 2010, J. Am. Medical Informatics Assoc..

[27]  Steven Wiltshire,et al.  Association Analysis of 6,736 U.K. Subjects Provides Replication and Confirms TCF7L2 as a Type 2 Diabetes Susceptibility Gene With a Substantial Effect on Individual Risk , 2006, Diabetes.