The eMERGE genotype set of 83,717 subjects imputed to ~40 million variants genome wide and association with the herpes zoster medical record phenotype

The Electronic Medical Records and Genomics (eMERGE) network is a network of medical centers with electronic medical records linked to existing biorepository samples for genomic discovery and genomic medicine research. The network sought to unify the genetic results from 78 Illumina and Affymetrix genotype array batches from 12 contributing medical centers for joint association analysis of 83,717 human participants. In this report, we describe the imputation of eMERGE results and methods to create the unified imputed merged set of genome‐wide variant genotype data. We imputed the data using the Michigan Imputation Server, which provides a missing single‐nucleotide variant genotype imputation service using the minimac3 imputation algorithm with the Haplotype Reference Consortium genotype reference set. We describe the quality control and filtering steps used in the generation of this data set and suggest generalizable quality thresholds for imputation and phenotype association studies. To test the merged imputed genotype set, we replicated a previously reported chromosome 6 HLA‐B herpes zoster (shingles) association and discovered a novel zoster‐associated loci in an epigenetic binding site near the terminus of chromosome 3 (3p29).

[1]  T. Baumert,et al.  Viral manipulation of STAT3: Evade, exploit, and injure , 2018, PLoS pathogens.

[2]  G. Leung,et al.  Age at menarche and depressive symptoms in older Southern Chinese women: A Mendelian randomization study in the Guangzhou Biobank Cohort Study , 2018, Psychiatry Research.

[3]  T. Pupko,et al.  Interleukin-6 and Interferon-α Signaling via JAK1–STAT Differentially Regulate Oncolytic versus Cytoprotective Antiviral States , 2018, Front. Immunol..

[4]  Zhengming Chen,et al.  Sleep behavior and depression: Findings from the China Kadoorie Biobank of 0.5 million Chinese adults , 2017, Journal of affective disorders.

[5]  Zhengming Chen,et al.  Age at natural menopause and risk of diabetes in adult women: Findings from the China Kadoorie Biobank study in the Zhejiang area , 2017, Journal of diabetes investigation.

[6]  M. Munafo,et al.  Associations of coffee genetic risk scores with consumption of coffee, tea and other beverages in the UK Biobank , 2017, Addiction.

[7]  C. Tepper,et al.  KSHV episomes reveal dynamic chromatin loop formation with domain-specific gene regulation , 2018, Nature Communications.

[8]  D. Melzer,et al.  Human longevity: 25 genetic loci associated in 389,166 UK biobank participants , 2017, Aging.

[9]  Ian J. Deary,et al.  Association analysis in over 329,000 individuals identifies 116 independent variants influencing neuroticism , 2017, Nature Genetics.

[10]  T. Lam,et al.  Association of adiposity with pulmonary function in older Chinese: Guangzhou Biobank Cohort Study. , 2017, Respiratory medicine.

[11]  Raquel S. Sevilla,et al.  Exome-wide association study of plasma lipids in >300,000 individuals , 2017, Nature Genetics.

[12]  T. Lam,et al.  Adiposity and incident diabetes within 4 years of follow‐up: the Guangzhou Biobank Cohort Study , 2017, Diabetic medicine : a journal of the British Diabetic Association.

[13]  P. Elliott,et al.  New Blood Pressure–Associated Loci Identified in Meta-Analyses of 475 000 Individuals , 2017, Circulation. Cardiovascular genetics.

[14]  R. Collins,et al.  Self‐Rated Health Status and Risk of Ischemic Heart Disease in the China Kadoorie Biobank Study: A Population‐Based Cohort Study , 2017, Journal of the American Heart Association.

[15]  Blair H. Smith,et al.  Haplotype-based association analysis of general cognitive ability in Generation Scotland, the English Longitudinal Study of Ageing, and UK Biobank , 2017, Wellcome open research.

[16]  G. Leung,et al.  Age at menarche and cardiovascular risk factors using Mendelian randomization in the Guangzhou Biobank Cohort Study. , 2017, Preventive medicine.

[17]  J. O'Brien,et al.  Cost and yield considerations when expanding recruitment for genetic studies: the primary open-angle African American glaucoma genetics study , 2017, BMC Medical Research Methodology.

[18]  J. Pell,et al.  M7 GENOME-WIDE ANALYSIS IN UK BIOBANK IDENTIFIES FOUR LOCI ASSOCIATED WITH MOOD INSTABILITY AND GENETIC CORRELATION WITH MDD, ANXIETY DISORDER AND SCHIZOPHRENIA , 2019, European Neuropsychopharmacology.

[19]  Tanya M. Teslovich,et al.  An Expanded Genome-Wide Association Study of Type 2 Diabetes in Europeans , 2017, Diabetes.

[20]  David M Howard,et al.  Genome-wide haplotype-based association analysis of major depressive disorder in Generation Scotland and UK Biobank , 2016, bioRxiv.

[21]  I. Deary,et al.  Genome-wide association study of alcohol consumption and genetic overlap with other health-related traits in UK Biobank (N=112 117) , 2017, Molecular Psychiatry.

[22]  T. Lam,et al.  Liver enzymes as mediators of association between obesity and diabetes: the Guangzhou Biobank Cohort Study. , 2017, Annals of epidemiology.

[23]  Zhengming Chen,et al.  Dietary Patterns and Insomnia Symptoms in Chinese Adults: The China Kadoorie Biobank , 2017, Nutrients.

[24]  Melissa A. Basford,et al.  Genome-wide study of resistant hypertension identified from electronic health records , 2017, PloS one.

[25]  J. Lyu,et al.  [Association between body mass index and both total and cause-specific mortality in China: findings from data through the China Kadoorie Biobank]. , 2017, Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi.

[26]  T. Lam,et al.  Mendelian randomization estimates of alanine aminotransferase with cardiovascular disease: Guangzhou Biobank Cohort study , 2016, Human molecular genetics.

[27]  N. Risch,et al.  Genome-wide association analyses using electronic health records identify new loci influencing blood pressure variation , 2016, Nature Genetics.

[28]  M. Woodward,et al.  Adiposity in relation to age at menarche and other reproductive factors among 300 000 Chinese women: findings from China Kadoorie Biobank study , 2016, International journal of epidemiology.

[29]  I. Deary,et al.  Genome-wide association study of alcohol consumption and genetic overlap with other health-related traits in UK Biobank , 2017 .

[30]  Gerard Tromp,et al.  Identifying gene–gene interactions that are highly associated with four quantitative lipid traits across multiple cohorts , 2016, Human Genetics.

[31]  T. Lam,et al.  Childhood secondhand smoke exposure and pregnancy loss in never smokers: the Guangzhou Biobank Cohort Study , 2016, Tobacco Control.

[32]  Alan M. Kwong,et al.  Next-generation genotype imputation service and methods , 2016, Nature Genetics.

[33]  N. Risch,et al.  A Large Genome-Wide Association Study of Age-Related Hearing Impairment Using Electronic Health Records , 2016, PLoS genetics.

[34]  Gerard Tromp,et al.  Epistatic Gene-Based Interaction Analyses for Glaucoma in eMERGE and NEIGHBOR Consortium , 2016, PLoS genetics.

[35]  N. Eriksson,et al.  Genome-wide association and HLA region fine-mapping studies identify susceptibility loci for multiple common infections , 2016, Nature Communications.

[36]  Tian Ge,et al.  Phenome-wide heritability analysis of the UK Biobank , 2016, bioRxiv.

[37]  Suzette J. Bielinski,et al.  eMERGE Phenome-Wide Association Study (PheWAS) identifies clinical associations and pleiotropy for stop-gain variants , 2016, BMC Medical Genomics.

[38]  Shane A. McCarthy,et al.  Reference-based phasing using the Haplotype Reference Consortium panel , 2016, Nature Genetics.

[39]  Bruce S Weir,et al.  Model-free Estimation of Recent Genetic Relatedness. , 2016, American journal of human genetics.

[40]  Stuart J. Ritchie,et al.  Shared genetic aetiology between cognitive functions and physical and mental health in UK Biobank (N=112 151) and 24 GWAS consortia , 2015, Molecular Psychiatry.

[41]  J G Linneman,et al.  A genome-wide association study identifies variants in KCNIP4 associated with ACE inhibitor-induced cough , 2015, The Pharmacogenomics Journal.

[42]  Jason H. Moore,et al.  Identifying gene-gene interactions that are highly associated with Body Mass Index using Quantitative Multifactor Dimensionality Reduction (QMDR) , 2015, BioData Mining.

[43]  Keith Marsolo,et al.  A GWAS Study on Liver Function Test Using eMERGE Network Participants , 2015, PloS one.

[44]  Agnes S. Sundaresan,et al.  Penetrance of Hemochromatosis in HFE Genotypes Resulting in p.Cys282Tyr and p.[Cys282Tyr];[His63Asp] in the eMERGE Network , 2015, American journal of human genetics.

[45]  A. Carbone,et al.  A natural HIV p17 protein variant up‐regulates the LMP‐1 EBV oncoprotein and promotes the growth of EBV‐infected B‐lymphocytes: Implications for EBV‐driven lymphomagenesis in the HIV setting , 2015, International journal of cancer.

[46]  Simon G. Thompson,et al.  UK Biobank comes of age , 2015, The Lancet.

[47]  J. Olson,et al.  A Robust e-Epidemiology Tool in Phenotyping Heart Failure with Differentiation for Preserved and Reduced Ejection Fraction: the Electronic Medical Records and Genomics (eMERGE) Network , 2015, Journal of Cardiovascular Translational Research.

[48]  Gerard Tromp,et al.  Biology-Driven Gene-Gene Interaction Analysis of Age-Related Cataract in the eMERGE Network , 2015, Genetic epidemiology.

[49]  J. Denny,et al.  Intelligent use and clinical benefits of electronic health records in rheumatoid arthritis , 2015, Expert review of clinical immunology.

[50]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[51]  R J Carroll,et al.  Genetic variation in the HLA region is associated with susceptibility to herpes zoster , 2014, Genes and Immunity.

[52]  Marylyn D. Ritchie,et al.  Imputation and quality control steps for combining multiple genome-wide datasets , 2014, Front. Genet..

[53]  Suzette J. Bielinski,et al.  Genetic Variants Associated with Serum Thyroid Stimulating Hormone (TSH) Levels in European Americans and African Americans from the eMERGE Network , 2014, PloS one.

[54]  Marylyn D. Ritchie,et al.  Controlling for population structure and genotyping platform bias in the eMERGE multi-institutional biobank linked to electronic health records , 2014, Front. Genet..

[55]  Marylyn D. Ritchie,et al.  Electronic medical records and genomics (eMERGE) network exploration in cataract: Several new potential susceptibility loci , 2014, Molecular vision.

[56]  Suzette J. Bielinski,et al.  Phenome-wide association studies demonstrating pleiotropy of genetic variants within FTO with and without adjustment for body mass index , 2014, Front. Genet..

[57]  Nicholette D. Palmer,et al.  Meta-Analysis of Genome-Wide Association Studies in African Americans Provides Insights into the Genetic Architecture of Type 2 Diabetes , 2014, PLoS genetics.

[58]  J. Haines,et al.  eMERGEing progress in genomics—the first seven years , 2014, Front. Genet..

[59]  Melissa A. Basford,et al.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data , 2013, Nature Biotechnology.

[60]  D. Huang,et al.  HIV-1 gp120 impairs B cell proliferation by inducing TGF-β1 production and FcRL4 expression , 2013, Nature Immunology.

[61]  C. Carlson,et al.  Enhancing the Power of Genetic Association Studies through the Use of Silver Standard Cases Derived from Electronic Medical Records , 2013, PloS one.

[62]  Melissa A. Basford,et al.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future , 2013, Genetics in Medicine.

[63]  Melissa A. Basford,et al.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[64]  Rex L. Chisholm At the Interface between Medical Informatics and Personalized Medicine: The eMERGE Network Experience , 2013, Healthcare informatics research.

[65]  真田 昌 骨髄異形成症候群のgenome-wide analysis , 2013 .

[66]  C. Carlson,et al.  Genetic variation associated with circulating monocyte count in the eMERGE Network. , 2013, Human molecular genetics.

[67]  David Levine,et al.  GWASTools: an R/Bioconductor package for quality control and analysis of genome-wide association studies , 2012, Bioinform..

[68]  E. Major,et al.  Human Immunodeficiency Virus Type 1 (HIV-1) Transactivator of Transcription through Its Intact Core and Cysteine-Rich Domains Inhibits Wnt/β-Catenin Signaling in Astrocytes: Relevance to HIV Neuropathogenesis , 2012, The Journal of Neuroscience.

[69]  William K. Thompson,et al.  High Density GWAS for LDL Cholesterol in African Americans Using Electronic Medical Records Reveals a Strong Protective Variant in APOE , 2012, Clinical and translational science.

[70]  A. Clark,et al.  Recent Explosive Human Population Growth Has Resulted in an Excess of Rare Genetic Variants , 2012, Science.

[71]  C. Carlson,et al.  Genetic variants associated with the white blood cell count in 13,923 subjects in the eMERGE Network , 2011, Human Genetics.

[72]  Dana C Crawford,et al.  Pitfalls of merging GWAS data: lessons learned in the eMERGE network and quality control procedures to maintain high data quality , 2011, Genetic epidemiology.

[73]  J. Marchini,et al.  Genotype Imputation with Thousands of Genomes , 2011, G3: Genes | Genomes | Genetics.

[74]  Christopher G Chute,et al.  Complement receptor 1 gene variants are associated with erythrocyte sedimentation rate. , 2011, American journal of human genetics.

[75]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[76]  Christopher G. Chute,et al.  Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies: the eMERGE Network experience , 2011, J. Am. Medical Informatics Assoc..

[77]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[78]  Rongling Li,et al.  Quality Control Procedures for Genome‐Wide Association Studies , 2011, Current protocols in human genetics.

[79]  Wendy A. Wolf,et al.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies , 2011, BMC Medical Genomics.

[80]  Paola Sebastiani,et al.  Clustering by genetic ancestry using genome-wide SNP data , 2010, BMC Genetics.

[81]  Taylor J. Maxwell,et al.  Deep resequencing reveals excess rare recent variants consistent with explosive population growth , 2010, Nature communications.

[82]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[83]  Allen D. Delaney,et al.  Conserved Role of Intragenic DNA Methylation in Regulating Alternative Promoters , 2010, Nature.

[84]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[85]  Gerome Breen,et al.  Genetic Variation , 2020, Population Genetics with R.

[86]  X. Fang,et al.  Human immunodeficiency virus type 1 Tat accelerates Kaposi sarcoma-associated herpesvirus Kaposin A-mediated tumorigenesis of transformed fibroblasts in vitro as well as in nude and immunocompetent mice. , 2009, Neoplasia.

[87]  Elizabeth T. Cirulli,et al.  Common Genetic Variation and the Control of HIV-1 in Humans , 2009, PLoS genetics.

[88]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[89]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[90]  Chih Lee,et al.  PCA-based population structure inference with generic clustering algorithms , 2009, BMC Bioinformatics.

[91]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[92]  Zan Huang,et al.  Intracellular Tat of Human Immunodeficiency Virus Type 1 Activates Lytic Cycle Replication of Kaposi's Sarcoma-Associated Herpesvirus: Role of JAK/STAT Signaling , 2006, Journal of Virology.

[93]  K. Khalili,et al.  Human immunodeficiency virus type 1 Tat prevents dephosphorylation of Sp1 by TCF-4 in astrocytes. , 2006, The Journal of general virology.

[94]  Terrence S. Furey,et al.  The UCSC Genome Browser Database: update 2006 , 2005, Nucleic Acids Res..

[95]  S. Norby [Mendelian randomization]. , 2005, Ugeskrift for laeger.

[96]  V. Felitti,et al.  Penetrance of hemochromatosis. , 2002, Blood cells, molecules & diseases.

[97]  K. Khalili,et al.  Evidence for Regulation of Long Terminal Repeat Transcription by Wnt Transcription Factor TCF-4 in Human Astrocytic Cells , 2002, Journal of Virology.

[98]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[99]  J. Manson,et al.  Age at natural menopause and risk of cardiovascular disease , 1999 .

[100]  N. Chirmule,et al.  HIV-1 Envelope Glycoproteins Induce Activation of Activated Protein-1 in CD4+ T Cells (*) , 1995, The Journal of Biological Chemistry.

[101]  D. F. Roberts,et al.  Age at menarche , 1994, The Lancet.