Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes

Genome-wide association studies (GWAS) aim to identify genetic factors associated with phenotypes. Standard analyses test variants for associations individually. However, variant-level associations are hard to identify and can be difficult to interpret biologically. Enrichment analyses help address both problems by targeting sets of biologically related variants. Here we introduce a new model-based enrichment method that requires only GWAS summary statistics. Applying this method to interrogate 4,026 gene sets in 31 human phenotypes identifies many previously-unreported enrichments, including enrichments of endochondral ossification pathway for height, NFAT-dependent transcription pathway for rheumatoid arthritis, brain-related genes for coronary artery disease, and liver-related genes for Alzheimer’s disease. A key feature of our method is that inferred enrichments automatically help identify new trait-associated genes. For example, accounting for enrichment in lipid transport genes highlights association between MTTP and low-density lipoprotein levels, whereas conventional analyses of the same data found no significant variants near this gene.In genome-wide association studies, variant-level associations are hard to identify and can be difficult to interpret biologically. Here, the authors develop a new model-based enrichment analysis method, and apply it to identify new associated genes, pathways and tissues across 31 human phenotypes.

[1]  Marina Evangelou,et al.  Two novel pathway analysis methods based on a hierarchical model , 2013, Bioinform..

[2]  Xia Yang,et al.  Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. , 2013, American journal of human genetics.

[3]  M. Stephens,et al.  Visualizing the structure of RNA-seq expression data using grade of membership models , 2017, PLoS genetics.

[4]  Anders Wallin,et al.  Reduced levels of amyloid-beta-binding proteins in cerebrospinal fluid from Alzheimer's disease patients. , 2009, Journal of Alzheimer's disease : JAD.

[5]  J. Hirschhorn,et al.  Biological interpretation of genome-wide association studies using predicted gene functions , 2015, Nature Communications.

[6]  Tanya M. Teslovich,et al.  Discovery and refinement of loci associated with lipid levels , 2013, Nature Genetics.

[7]  Nan Su,et al.  Role of FGF/FGFR signaling in skeletal development and homeostasis: learning from mouse models , 2014, Bone Research.

[8]  Anna Kinsey,et al.  Plasma transthyretin as a candidate marker for Alzheimer's disease. , 2012, Journal of Alzheimer's disease : JAD.

[9]  M. Daly,et al.  An Atlas of Genetic Correlations across Human Diseases and Traits , 2015, Nature Genetics.

[10]  D. Sitara,et al.  Transcriptional regulation of bone and joint remodeling by NFAT , 2010, Immunological reviews.

[11]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[12]  Ross M. Fraser,et al.  Defining the role of common variation in the genomic and biological architecture of adult human height , 2014, Nature Genetics.

[13]  Annelot M. Dekker,et al.  Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis , 2017 .

[14]  Yakir A Reshef,et al.  Partitioning heritability by functional annotation using genome-wide association summary statistics , 2015, Nature Genetics.

[15]  Coronary artery disease is associated with Alzheimer disease neuropathology in APOE4 carriers. , 2006, Neurology.

[16]  Jonathan P. Beauchamp,et al.  Genetic variants associated with subjective well-being, depressive symptoms and neuroticism identified through genome-wide analyses , 2016, Nature Genetics.

[17]  G. Karsenty,et al.  HDAC4 integrates PTH and sympathetic signaling in osteoblasts , 2014, The Journal of cell biology.

[18]  Dallas Jones,et al.  Emerging roles of PPARS in inflammation and immunity , 2002, Nature Reviews Immunology.

[19]  M. Stephens,et al.  Bayesian variable selection regression for genome-wide association studies and other large-scale problems , 2011, 1110.6019.

[20]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[21]  David C. Wilson,et al.  Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease , 2012, Nature.

[22]  P. Visscher,et al.  A versatile gene-based test for genome-wide association studies. , 2010, American journal of human genetics.

[23]  Claude Bouchard,et al.  Identification of heart rate-associated loci and their effects on cardiac conduction and rhythm disorders , 2014 .

[24]  Nick C Fox,et al.  Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease , 2013, Nature Genetics.

[25]  Matthew Stephens,et al.  USING LINEAR PREDICTORS TO IMPUTE ALLELE FREQUENCIES FROM SUMMARY OR POOLED GENOTYPE DATA. , 2010, The annals of applied statistics.

[26]  M. Stephens,et al.  Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies , 2012 .

[27]  Ross M. Fraser,et al.  Genetic studies of body mass index yield new insights for obesity biology , 2015, Nature.

[28]  Jussi Taipale,et al.  Hedgehog: functions and mechanisms. , 2008, Genes & development.

[29]  Ayellet V. Segrè,et al.  Common Inherited Variation in Mitochondrial Genes Is Not Enriched for Associations with Type 2 Diabetes or Related Glycemic Traits , 2010, PLoS genetics.

[30]  Ayellet V. Segrè,et al.  A systematic survey of human tissue-specific gene expression and splicing reveals new opportunities for therapeutic target identification and evaluation , 2018, bioRxiv.

[31]  Fabian J Theis,et al.  Genome-wide association analyses identify 18 new loci associated with serum urate concentrations , 2012, Nature Genetics.

[32]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[33]  Pedro M. Valero-Mora,et al.  ggplot2: Elegant Graphics for Data Analysis , 2010 .

[34]  Joseph K. Pickrell,et al.  Detection and interpretation of shared genetic influences on 42 human traits , 2015, Nature Genetics.

[35]  F. Macian,et al.  NFAT proteins: key regulators of T-cell development and function , 2005, Nature Reviews Immunology.

[36]  A. Strasser,et al.  The many roles of FAS receptor signaling in the immune system. , 2009, Immunity.

[37]  Sally L Elshaer,et al.  Implication of the neurotrophin receptor p75NTR in vascular diseases: beyond the eye , 2017, Expert review of ophthalmology.

[38]  Jun S. Liu,et al.  Genetics of rheumatoid arthritis contributes to biology and drug discovery , 2013 .

[39]  Pritam Das,et al.  Transthyretin protects Alzheimer's mice from the behavioral and biochemical effects of Aβ toxicity , 2008, Proceedings of the National Academy of Sciences.

[40]  Tariq Ahmad,et al.  Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47 , 2011, Nature Genetics.

[41]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[42]  J. Mayer,et al.  Beyond the Eye , 2000 .

[43]  Hsien-Da Huang,et al.  miRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions , 2013, Nucleic Acids Res..

[44]  Alexander R. Pico,et al.  WikiPathways: Pathway Editing for the People , 2008, PLoS biology.

[45]  Matthew Stephens,et al.  False discovery rates: a new deal , 2016, bioRxiv.

[46]  M. Schwarz,et al.  The Role of Inflammation in Alzheimer’s Disease , 2015 .

[47]  Stephen D. Turner,et al.  qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots , 2014, bioRxiv.

[48]  G. D. Paolo,et al.  Linking lipids to Alzheimer's disease: cholesterol and beyond , 2011, Nature Reviews Neuroscience.

[49]  M. Brown,et al.  Promise and pitfalls of the Immunochip , 2011, Arthritis research & therapy.

[50]  Marion A Cooley,et al.  Cubilin maintains blood levels of HDL and albumin. , 2014, Journal of the American Society of Nephrology : JASN.

[51]  J. Potash,et al.  COMBAT: A Combined Association Test for Genes Using Summary Statistics , 2017, Genetics.

[52]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[53]  T. Lehtimäki,et al.  Integrative approaches for large-scale transcriptome-wide association studies , 2015, Nature Genetics.

[54]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[55]  M. Stephens A Unified Framework for Association Analysis with Multiple Related Phenotypes , 2013, PloS one.

[56]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[57]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[58]  P. Karp,et al.  Computational prediction of human metabolic pathways from the complete human genome , 2004, Genome Biology.

[59]  J. McElwee,et al.  IL-12 and IL-23 cytokines: from discovery to targeted therapies for immune-mediated inflammatory diseases , 2015, Nature Medicine.

[60]  Hanspeter Pfister,et al.  UpSet: Visualization of Intersecting Sets , 2014, IEEE Transactions on Visualization and Computer Graphics.

[61]  W. Strittmatter,et al.  Transthyretin sequesters amyloid beta protein and prevents amyloid formation. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[63]  Renata C. Geer,et al.  The NCBI BioSystems database , 2009, Nucleic Acids Res..

[64]  P. Deloukas,et al.  Multiple common variants for celiac disease influencing immune gene expression , 2010, Nature Genetics.

[65]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[66]  Claude Bouchard,et al.  A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance , 2012, Nature Genetics.

[67]  Michael Boehnke,et al.  LocusZoom: regional visualization of genome-wide association scan results , 2010, Bioinform..

[68]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[69]  Simon Y Tang,et al.  Matrix metalloproteinase–13 is required for osteocytic perilacunar remodeling and maintains bone fracture resistance , 2012, Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research.

[70]  Shaun M. Purcell,et al.  Statistical power and significance testing in large-scale genetic studies , 2014, Nature Reviews Genetics.

[71]  D. Rader,et al.  Lomitapide and mipomersen: two first-in-class drugs for reducing low-density lipoprotein cholesterol in patients with homozygous familial hypercholesterolemia. , 2014, Circulation.

[72]  G. D. Paolo,et al.  Linking lipids to Alzheimer's disease: cholesterol and beyond , 2011, Nature Reviews Cancer.

[73]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[74]  J. Danesh,et al.  A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease , 2016 .

[75]  H. Hakonarson,et al.  Analysing biological pathways in genome-wide association studies , 2010, Nature Reviews Genetics.

[76]  Samuel H. Gellman,et al.  PTH receptor-1 signalling—mechanistic insights and therapeutic prospects , 2015, Nature Reviews Endocrinology.

[77]  Andrew D. Johnson,et al.  Large-scale genomic analyses link reproductive ageing to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair , 2015, Nature Genetics.

[78]  R. Varadhan,et al.  Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm , 2008 .

[79]  Daniel Marbach,et al.  Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics , 2016, PLoS Comput. Biol..

[80]  C. Spencer,et al.  A contribution of novel CNVs to schizophrenia from a genome-wide study of 41,321 subjects: CNV Analysis Group and the Schizophrenia Working Group of the Psychiatric Genomics Consortium , 2016, bioRxiv.

[81]  Paul T. Tarr,et al.  ABCG1 has a critical role in mediating cholesterol efflux to HDL and preventing cellular lipid accumulation. , 2005, Cell metabolism.

[82]  N. Cox,et al.  Obesity-associated variants within FTO form long-range functional connections with IRX3 , 2014, Nature.

[83]  Kenneth M. Murphy,et al.  A crucial role for HVEM and BTLA in preventing intestinal inflammation , 2008, The Journal of experimental medicine.

[84]  F. Lupu,et al.  Persistence of Atherosclerotic Plaque but Reduced Aneurysm Formation in Mice With Stromelysin-1 (MMP-3) Gene Inactivation , 2001, Arteriosclerosis, thrombosis, and vascular biology.

[85]  Nicola Dalbeth,et al.  Mechanisms of joint damage in gout: evidence from cellular and imaging studies , 2012, Nature Reviews Rheumatology.

[86]  Tanya M. Teslovich,et al.  Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes , 2012, Nature Genetics.

[87]  P. Isakson,et al.  Selective inhibition of cyclooxygenase (COX)-2 reverses inflammation and expression of COX-2 and interleukin 6 in rat adjuvant arthritis. , 1996, The Journal of clinical investigation.

[88]  Judy H. Cho,et al.  Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations , 2015, Nature Genetics.

[89]  Markus F. Neurath,et al.  Cytokines in inflammatory bowel disease , 2014, Nature Reviews Immunology.

[90]  M. Stephens,et al.  Integrated Enrichment Analysis of Variants and Pathways in Genome-Wide Association Studies Indicates Central Role for IL-2 Signaling Genes in Type 1 Diabetes, and Cytokine Signaling Genes in Crohn's Disease , 2013, PLoS genetics.

[91]  A. Truswell,et al.  Cholesterol and beyond , 2010 .

[92]  G. Eisen,et al.  Randomized placebo-controlled trial comparing efficacy and safety of valdecoxib with naproxen in patients with osteoarthritis. , 2002, The Journal of family practice.

[93]  Huaiyu Mi,et al.  PANTHER pathway: an ontology-based pathway database coupled with data analysis tools. , 2009, Methods in molecular biology.

[94]  D. Schaid,et al.  From genome-wide associations to candidate causal variants by statistical fine-mapping , 2018, Nature Reviews Genetics.

[95]  Yang I Li,et al.  An Expanded View of Complex Traits: From Polygenic to Omnigenic , 2017, Cell.

[96]  T. Heskes,et al.  The statistical properties of gene-set analysis , 2016, Nature Reviews Genetics.

[97]  Tanya M. Teslovich,et al.  The Metabochip, a Custom Genotyping Array for Genetic Studies of Metabolic, Cardiovascular, and Anthropometric Traits , 2012, PLoS genetics.

[98]  Paul Schoenhagen,et al.  Statins, high-density lipoprotein cholesterol, and regression of coronary atherosclerosis. , 2007, JAMA.

[99]  C Rosendorff,et al.  Coronary artery disease is associated with Alzheimer disease neuropathology in APOE4 carriers , 2006, Neurology.

[100]  Laszlo Nagy,et al.  PPARγ in immunity and inflammation: cell types and diseases , 2007 .

[101]  Christian Gieger,et al.  Seventy-five genetic loci influencing the human red blood cell , 2012, Nature.

[102]  Peter Donnelly,et al.  Progress and promise in understanding the genetic basis of common diseases , 2015, Proceedings of the Royal Society B: Biological Sciences.

[103]  Xintao Wei,et al.  Genome-wide Identification of Zero Nucleotide Recursive Splicing in Drosophila , 2014, Nature.

[104]  Alistair Moffat,et al.  A similarity measure for indefinite rankings , 2010, TOIS.

[105]  Min Xu,et al.  Rare Genetic Variants of the Transthyretin Gene Are Associated with Alzheimer’s Disease in Han Chinese , 2017, Molecular Neurobiology.

[106]  Johnny S. H. Kwan,et al.  GATES: a rapid and powerful gene-based association test using extended Simes procedure. , 2011, American journal of human genetics.

[107]  Xiang Zhu,et al.  Bayesian large-scale multiple regression with summary statistics from genome-wide association studies , 2016, bioRxiv.

[108]  Michelle K. Lupton,et al.  Influence of Coding Variability in APP-Aβ Metabolism Genes in Sporadic Alzheimer’s Disease , 2016, PloS one.

[109]  Andreas Zell,et al.  Precise generation of systems biology models from KEGG pathways , 2013, BMC Systems Biology.

[110]  Tamara S. Roman,et al.  New genetic loci link adipose and insulin biology to body fat distribution , 2014, Nature.

[111]  M. Daly,et al.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies , 2014, Nature Genetics.

[112]  E. Mackie,et al.  Endochondral ossification: how cartilage is converted into bone in the developing skeleton. , 2008, The international journal of biochemistry & cell biology.

[113]  Hiroshi Kataoka,et al.  Uric acid as a danger signal in gout and its comorbidities , 2013, Nature Reviews Rheumatology.

[114]  John M. Shelton,et al.  Histone Deacetylase 4 Controls Chondrocyte Hypertrophy during Skeletogenesis , 2004, Cell.

[115]  J. Nielsen,et al.  Analysis of the Human Tissue-specific Expression by Genome-wide Integration of Transcriptomics and Antibody-based Proteomics* , 2013, Molecular & Cellular Proteomics.

[116]  Michael C Sachs,et al.  plotROC: A Tool for Plotting ROC Curves. , 2017, Journal of statistical software.

[117]  Annik Prat,et al.  Furin Is the Major Processing Enzyme of the Cardiac-specific Growth Factor Bone Morphogenetic Protein 10* , 2011, The Journal of Biological Chemistry.

[118]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[119]  Huaxi Xu,et al.  Apolipoprotein E and Alzheimer disease: risk, mechanisms and therapy , 2013, Nature Reviews Neurology.

[120]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[121]  Burkhard Becher,et al.  Immune attack: the role of inflammation in Alzheimer disease , 2015, Nature Reviews Neuroscience.

[122]  Eden R Martin,et al.  A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms , 2008, Genetic epidemiology.

[123]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[124]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..