Genome-wide rare variant analysis for thousands of phenotypes in 54,000 exomes

Defining the effects that rare variants can have on human phenotypes is essential to advancing our understanding of human health and disease. Large-scale human genetic analyses have thus far focused on common variants, but the development of large cohorts of deeply phenotyped individuals with exome sequence data has now made comprehensive analyses of rare variants possible. We analyzed the effects of rare (MAF<0.1%) variants on 3,166 phenotypes in 40,468 exome-sequenced individuals from the UK Biobank and performed replication as well as meta-analyses with 1,067 phenotypes in 13,470 members of the Healthy Nevada Project (HNP) cohort who underwent Exome+ sequencing at Helix. Our analyses of non-benign coding and loss of function (LoF) variants identified 78 gene-based associations that passed our statistical significance threshold (p<5×10-9). These are associations in which carrying any rare coding or LoF variant in the gene is associated with an enrichment for a specific phenotype, as opposed to GWAS-based associations of strictly single variants. Importantly, our results do not suffer from the test statistic inflation that is often seen with rare variant analyses of biobank-scale data because of our rare variant-tailored methodology, which includes a step that optimizes the carrier frequency threshold for each phenotype based on prevalence. Of the 47 discovery associations whose phenotypes were represented in the replication cohort, 98% showed effects in the expected direction, and 45% attained formal replication significance (p<0.001). Six additional significant associations were identified in our meta-analysis of both cohorts. Among the results, we confirm known associations of PCSK9 and APOB variation with LDL levels; we extend knowledge of variation in the TYRP1 gene, previously associated with blonde hair color only in Solomon Islanders to blonde hair color in individuals of European ancestry; we show that PAPPA, a gene in which common variants had previously associated with height via GWAS, contains rare variants that decrease height; and we make the novel discovery that STAB1 variation is associated with blood flow in the brain. Our results are available for download and interactive browsing in an app (https://ukb.research.helix.com). This comprehensive analysis of the effects of rare variants on human phenotypes marks one of the first steps in the next big phase of human genetics, where large, deeply phenotyped cohorts with next generation sequence data will elucidate the effects of rare variants.

[1]  Qianqian Zhu,et al.  A genome-wide comparison of the functional properties of rare and common genetic variants in humans. , 2011, American journal of human genetics.

[2]  G. Davey Smith,et al.  An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome , 2018, bioRxiv.

[3]  Mikel Hernaez,et al.  Computational performance and accuracy of Sentieon DNASeq variant calling workflow , 2018, bioRxiv.

[4]  Giulio Genovese,et al.  Increased burden of ultra-rare protein-altering variants among 4,877 individuals with schizophrenia , 2016, Nature Neuroscience.

[5]  Andrew J. Hill,et al.  Analysis of protein-coding genetic variation in 60,706 humans , 2015, bioRxiv.

[6]  Andrew D. Johnson,et al.  Erratum: Whole-Exome Sequencing Identifies Loci Associated with Blood Cell Traits and Reveals a Role for Alternative GFI1B Splice Variants in Human Hematopoiesis (American Journal of Human Genetics (2016) 99(2) (481–488)(S0002929716302208)(10.1016/j.ajhg.2016.06.016)) , 2016 .

[7]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[8]  Tomas W. Fitzgerald,et al.  Large-scale discovery of novel genetic causes of developmental disorders , 2014, Nature.

[9]  Stephen C. J. Parker,et al.  The genetic architecture of type 2 diabetes , 2016, Nature.

[10]  R Plomin,et al.  Phenome-wide analysis of genome-wide polygenic scores , 2015, Molecular Psychiatry.

[11]  Gonçalo Abecasis,et al.  Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank , 2019, bioRxiv.

[12]  Yeting Zhang,et al.  Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects , 2018, Nature Communications.

[13]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[14]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[15]  D. Gudbjartsson,et al.  Variant ASGR1 Associated with a Reduced Risk of Coronary Artery Disease. , 2016, The New England journal of medicine.

[16]  J. Todd,et al.  Rare Variants of IFIH1, a Gene Implicated in Antiviral Responses, Protect Against Type 1 Diabetes , 2009, Science.

[17]  Andrew D. Johnson,et al.  Whole-Exome Sequencing Identifies Loci Associated with Blood Cell Traits and Reveals a Role for Alternative GFI1B Splice Variants in Human Hematopoiesis. , 2016, American journal of human genetics.

[18]  Marcelo P. Segura-Lepe,et al.  Rare and low-frequency coding variants alter human adult height , 2016, Nature.

[19]  Claude Bouchard,et al.  Meta-analysis identifies common and rare variants influencing blood pressure and overlapping with metabolic trait loci , 2016, Nature Genetics.

[20]  Diana C. Chong,et al.  Developmental SMAD6 loss leads to blood vessel hemorrhage and disrupted endothelial cell junctions. , 2018, Developmental biology.

[21]  L. Schild,et al.  Human Mutations in SLC2A9 (Glut9) Affect Transport Capacity for Urate , 2018, Front. Physiol..

[22]  Mark Gerstein,et al.  GENCODE reference annotation for the human and mouse genomes , 2018, Nucleic Acids Res..

[23]  G. Abecasis,et al.  Rare-variant association analysis: study designs and statistical tests. , 2014, American journal of human genetics.

[24]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[25]  L. Zhu,et al.  trackViewer: a Bioconductor package for interactive and integrative visualization of multi-omics data , 2019, Nature Methods.

[26]  Brittany N. Lasseigne,et al.  Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways , 2015, Science.

[27]  P. Visscher,et al.  Advantages and pitfalls in the application of mixed-model association methods , 2014, Nature Genetics.

[28]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[29]  George Davey Smith,et al.  PHESANT: a tool for performing automated phenome scans in UK Biobank , 2017 .

[30]  Amalio Telenti,et al.  Whole-genome sequencing identifies common-to-rare variants associated with human blood metabolites , 2017, Nature Genetics.

[31]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[32]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[33]  Andrew D. Johnson,et al.  Whole-Exome Sequencing Identifies Loci Associated with Blood Cell Traits and Reveals a Role for Alternative GFI1B Splice Variants in Human Hematopoiesis. , 2016, American journal of human genetics.

[34]  Mary E. Haas,et al.  Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations , 2018, Nature Genetics.

[35]  Alkes L. Price,et al.  Mixed-model association for biobank-scale datasets , 2018, Nature Genetics.

[36]  Astrid Gall,et al.  Ensembl 2018 , 2017, Nucleic Acids Res..

[37]  Jing Hu,et al.  SIFT web server: predicting effects of amino acid substitutions on proteins , 2012, Nucleic Acids Res..

[38]  Exome sequencing identifies rare LDLR and APOA 5 alleles conferring risk for myocardial infarction , 2016 .

[39]  T. Möröy,et al.  From cytopenia to leukemia: the role of Gfi1 and Gfi1b in blood formation. , 2015, Blood.

[40]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[41]  C. Bustamante,et al.  Melanesian Blond Hair Is Caused by an Amino Acid Change in TYRP1 , 2012, Science.

[42]  Po-Ru Loh,et al.  Mixed-model association for biobank-scale datasets , 2018, Nature Genetics.

[43]  Christian H. Ahrens,et al.  Protter: interactive protein feature visualization and integration with experimental proteomic data , 2014, Bioinform..

[44]  Andrew D. Johnson,et al.  Multiple rare alleles at LDLR and APOA5 confer risk for early-onset myocardial infarction , 2014, Nature.