The effects of electronic medical record phenotyping details on genetic association studies: HDL-C as a case study

BackgroundBiorepositories linked to de-identified electronic medical records (EMRs) have the potential to complement traditional epidemiologic studies in genotype-phenotype studies of complex human diseases and traits. A major challenge in meeting this potential is the use of EMR-derived data to extract phenotypes and covariates for genetic association studies. Unlike traditional epidemiologic data, EMR-derived data are collected for clinical care and are therefore highly variable across patients. The variability of clinical data coupled with the challenges associated with searching unstructured clinical notes requires the development of algorithms to extract phenotypes for analysis. Given the number of possible algorithms that could be developed for any one EMR-derived phenotype, we explored here the impact algorithm decision logic has on genetic association study results for a single quantitative trait, high density lipoprotein cholesterol (HDL-C).ResultsWe used five different algorithms to extract HDL-C from African American subjects genotyped on the Illumina Metabochip (n = 11,519) as part of Epidemiologic Architecture for Genes Linked to Environment (EAGLE). Tests of association between HDL-C and genetic risk scores for HDL-C associated variants suggest that the genetic effect size does not vary substantially across the five HDL-C definitions.ConclusionsThese data collectively suggest that, at least for this quantitative trait, algorithm decision logic and phenotyping details do not appreciably impact genetic association study test statistics.

[1]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[2]  Melissa A. Basford,et al.  Identification of Genomic Predictors of Atrioventricular Conduction: Using Electronic Medical Records as a Tool for Genome Science , 2010, Circulation.

[3]  Jennifer G. Robinson,et al.  Evaluation of the Metabochip Genotyping Array in African Americans and Implications for Fine Mapping of GWAS-Identified Loci: The PAGE Study , 2012, PloS one.

[4]  Wendy A. Wolf,et al.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies , 2011, BMC Medical Genomics.

[5]  Melissa A. Basford,et al.  Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. , 2010, American journal of human genetics.

[6]  A. Engel,et al.  PloS One 2012 , 2015 .

[7]  C. Carlson,et al.  Enhancing the Power of Genetic Association Studies through the Use of Silver Standard Cases Derived from Electronic Medical Records , 2013, PloS one.

[8]  Melissa A. Basford,et al.  Genome- and Phenome-Wide Analyses of Cardiac Conduction Identifies Markers of Arrhythmia Risk , 2013, Circulation.

[9]  L. Cardon,et al.  Designing candidate gene and genome-wide case–control association studies , 2007, Nature Protocols.

[10]  Robert J. Goodloe,et al.  Leveraging Epidemiologic and Clinical Collections for Genomic Studies of Complex Traits , 2015, Human Heredity.

[11]  D. Roden,et al.  Development of a Large‐Scale De‐Identified DNA Biobank to Enable Personalized Medicine , 2008, Clinical pharmacology and therapeutics.

[12]  Huaqin Pan,et al.  The PhenX Toolkit: Get the Most From Your Measures , 2011, American journal of epidemiology.

[13]  C. Carlson,et al.  The Next PAGE in Understanding Complex Traits: Design for the Analysis of Population Architecture Using Genetics and Epidemiology (PAGE) Study , 2011, American journal of epidemiology.

[14]  Melissa A. Basford,et al.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future , 2013, Genetics in Medicine.

[15]  C. Carlson,et al.  Genetic variation associated with circulating monocyte count in the eMERGE Network. , 2013, Human molecular genetics.

[16]  Teri A Manolio,et al.  Biorepositories--at the bleeding edge. , 2008, International journal of epidemiology.

[17]  Tanya M. Teslovich,et al.  The Metabochip, a Custom Genotyping Array for Genetic Studies of Metabolic, Cardiovascular, and Anthropometric Traits , 2012, PLoS genetics.

[18]  Teri A. Manolio,et al.  Bringing genome-wide association findings into clinical use , 2013, Nature Reviews Genetics.

[19]  Marylyn D. Ritchie,et al.  ERRATUM: NEXT-GENERATION ANALYSIS OF CATARACTS: DETERMINING KNOWLEDGE DRIVEN GENE-GENE INTERACTIONS USING BIOFILTER, AND GENE-ENVIRONMENT INTERACTIONS USING THE PHENX TOOLKIT. , 2015, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[20]  Hua Xu,et al.  Facilitating pharmacogenetic studies using electronic health records and natural-language processing: a case study of warfarin , 2011, J. Am. Medical Informatics Assoc..

[21]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[22]  S. Lewandowsky PLOS ONE 2013 , 2015 .

[23]  William K. Thompson,et al.  High Density GWAS for LDL Cholesterol in African Americans Using Electronic Medical Records Reveals a Strong Protective Variant in APOE , 2012, Clinical and translational science.

[24]  E. Clayton,et al.  Principles of Human Subjects Protections Applied in an Opt‐Out, De‐identified Biobank , 2010, Clinical and translational science.

[25]  Marylyn D. Ritchie,et al.  Knowledge-Driven Multi-Locus Analysis Reveals Gene-Gene Interactions Influencing HDL Cholesterol Level in Two Independent EMR-Linked Biobanks , 2011, PloS one.

[26]  Melissa A. Basford,et al.  Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. , 2011, American journal of human genetics.

[27]  C. Carlson,et al.  Genetic variants associated with the white blood cell count in 13,923 subjects in the eMERGE Network , 2011, Human Genetics.

[28]  J. Barrett,et al.  New IBD genetics: common pathways with other diseases , 2011, Gut.

[29]  Jennifer G. Robinson,et al.  Trans-Ethnic Fine-Mapping of Lipid Loci Identifies Population-Specific Signals and Allelic Heterogeneity That Increases the Trait Variance Explained , 2013, PLoS genetics.

[30]  J. Moore,et al.  BioData Mining , 2017 .

[31]  C. Chute,et al.  Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium , 2011, Science Translational Medicine.

[32]  T. Manolio,et al.  Genetic Variants That Confer Resistance to Malaria Are Associated with Red Blood Cell Traits in African-Americans: An Electronic Medical Record-based Genome-Wide Association Study , 2013, G3: Genes, Genomes, Genetics.