Genome-wide rare variant analysis for thousands of phenotypes in over 70,000 exomes from two cohorts

Understanding the impact of rare variants is essential to understanding human health. We analyze rare (MAF < 0.1%) variants against 4264 phenotypes in 49,960 exome-sequenced individuals from the UK Biobank and 1934 phenotypes (1821 overlapping with UK Biobank) in 21,866 members of the Healthy Nevada Project (HNP) cohort who underwent Exome + sequencing at Helix. After using our rare-variant-tailored methodology to reduce test statistic inflation, we identify 64 statistically significant gene-based associations in our meta-analysis of the two cohorts and 37 for phenotypes available in only one cohort. Singletons make significant contributions to our results, and the vast majority of the associations could not have been identified with a genotyping chip. Our results are available for interactive browsing in a webapp ( https://ukb.research.helix.com ). This comprehensive analysis illustrates the biological value of large, deeply phenotyped cohorts of unselected populations coupled with NGS data. Population-based association analyses of rare genetic variants with complex traits are limited by the availability of data from sufficiently large cohorts. Here, Cirulli et al. report gene-based collapsing analysis of exomes from 49,960 participants of the UK Biobank and 21,866 participants of the Healthy Nevada Project over a total of 4377 traits.

[1]  Gregory M. Cooper,et al.  CADD: predicting the deleteriousness of variants throughout the human genome , 2018, Nucleic Acids Res..

[2]  John Atherton,et al.  Mutations of TTN, encoding the giant muscle filament titin, cause familial dilated cardiomyopathy , 2002, Nature Genetics.

[3]  A. Slonim,et al.  Population Health Genetic Screening for Tier 1 Inherited Diseases in Northern Nevada: 90% of At-Risk Carriers are Missed , 2019, bioRxiv.

[4]  G. Davey Smith,et al.  An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome , 2018, bioRxiv.

[5]  Tomas W. Fitzgerald,et al.  Large-scale discovery of novel genetic causes of developmental disorders , 2014, Nature.

[6]  Qianqian Zhu,et al.  A genome-wide comparison of the functional properties of rare and common genetic variants in humans. , 2011, American journal of human genetics.

[7]  Ross M. Fraser,et al.  Defining the role of common variation in the genomic and biological architecture of adult human height , 2014, Nature Genetics.

[8]  D. Gudbjartsson,et al.  Variant ASGR1 Associated with a Reduced Risk of Coronary Artery Disease. , 2016, The New England journal of medicine.

[9]  Christian H. Ahrens,et al.  Protter: interactive protein feature visualization and integration with experimental proteomic data , 2014, Bioinform..

[10]  Claude Bouchard,et al.  Meta-analysis identifies common and rare variants influencing blood pressure and overlapping with metabolic trait loci , 2016, Nature Genetics.

[11]  Andrew D. Johnson,et al.  Multiple rare alleles at LDLR and APOA5 confer risk for early-onset myocardial infarction , 2014, Nature.

[12]  Giulio Genovese,et al.  Increased burden of ultra-rare protein-altering variants among 4,877 individuals with schizophrenia , 2016, Nature Neuroscience.

[13]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[14]  H. Morizono,et al.  Disease‐causing mutations in the promoter and enhancer of the ornithine transcarbamylase gene , 2018, Human mutation.

[15]  S. Tuft,et al.  Ectopic GRHL2 Expression Due to Non-coding Mutations Promotes Cell State Transition and Causes Posterior Polymorphous Corneal Dystrophy 4 , 2018, American journal of human genetics.

[16]  Mary E. Haas,et al.  Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations , 2018, Nature Genetics.

[17]  R. Haydon,et al.  Bone Morphogenetic Protein (BMP) signaling in development and human diseases , 2014, Genes & diseases.

[18]  Astrid Gall,et al.  Ensembl 2018 , 2017, Nucleic Acids Res..

[19]  C. Bustamante,et al.  Melanesian Blond Hair Is Caused by an Amino Acid Change in TYRP1 , 2012, Science.

[20]  Po-Ru Loh,et al.  Mixed-model association for biobank-scale datasets , 2018, Nature Genetics.

[21]  J. Belmont,et al.  Case for genome sequencing in infants and children with rare, undiagnosed or genetic diseases , 2019, Journal of Medical Genetics.

[22]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[23]  Mark Gerstein,et al.  GENCODE reference annotation for the human and mouse genomes , 2018, Nucleic Acids Res..

[24]  J. Rosenfeld,et al.  Reanalysis of Clinical Exome Sequencing Data. , 2019, The New England journal of medicine.

[25]  P. Visscher,et al.  Advantages and pitfalls in the application of mixed-model association methods , 2014, Nature Genetics.

[26]  Amalio Telenti,et al.  Whole-genome sequencing identifies common-to-rare variants associated with human blood metabolites , 2017, Nature Genetics.

[27]  M. Rieder,et al.  Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. , 2012, American journal of human genetics.

[28]  Andrew D. Johnson,et al.  Erratum: Whole-Exome Sequencing Identifies Loci Associated with Blood Cell Traits and Reveals a Role for Alternative GFI1B Splice Variants in Human Hematopoiesis (American Journal of Human Genetics (2016) 99(2) (481–488)(S0002929716302208)(10.1016/j.ajhg.2016.06.016)) , 2016 .

[29]  Neil M Davies,et al.  Software Application Profile: PHESANT: a tool for performing automated phenome scans in UK Biobank , 2017, International journal of epidemiology.

[30]  Broad Genomics Platform,et al.  Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls , 2019 .

[31]  R Plomin,et al.  Phenome-wide analysis of genome-wide polygenic scores , 2015, Molecular Psychiatry.

[32]  Mikel Hernaez,et al.  Sentieon DNASeq Variant Calling Workflow Demonstrates Strong Computational Performance and Accuracy , 2019, Front. Genet..

[33]  L. Schild,et al.  Human Mutations in SLC2A9 (Glut9) Affect Transport Capacity for Urate , 2018, Front. Physiol..

[34]  Damian Smedley,et al.  Next-generation diagnostics and disease-gene discovery with the Exomiser , 2015, Nature Protocols.

[35]  G. Abecasis,et al.  Rare-variant association analysis: study designs and statistical tests. , 2014, American journal of human genetics.

[36]  Yun Li,et al.  METAL: fast and efficient meta-analysis of genomewide association scans , 2010, Bioinform..

[37]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[38]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[39]  J. Todd,et al.  Rare Variants of IFIH1, a Gene Implicated in Antiviral Responses, Protect Against Type 1 Diabetes , 2009, Science.

[40]  Andrew D. Johnson,et al.  Whole-Exome Sequencing Identifies Loci Associated with Blood Cell Traits and Reveals a Role for Alternative GFI1B Splice Variants in Human Hematopoiesis. , 2016, American journal of human genetics.

[41]  Marcelo P. Segura-Lepe,et al.  Rare and low-frequency coding variants alter human adult height , 2016, Nature.

[42]  Jing Hu,et al.  SIFT web server: predicting effects of amino acid substitutions on proteins , 2012, Nucleic Acids Res..

[43]  T. Möröy,et al.  From cytopenia to leukemia: the role of Gfi1 and Gfi1b in blood formation. , 2015, Blood.

[44]  Stephen C. J. Parker,et al.  The genetic architecture of type 2 diabetes , 2016, Nature.

[45]  Yeting Zhang,et al.  Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects , 2018, Nature Communications.

[46]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[47]  D. Bidwell,et al.  Formation , 2006, Revue Francophone d'Orthoptie.

[48]  Brittany N. Lasseigne,et al.  Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways , 2015, Science.

[49]  David M. Herrington,et al.  Multiple rare alleles at LDLR and APOA5 confer risk for early-onset myocardial infarction , 2014, Nature.

[50]  Euan A Ashley,et al.  The Undiagnosed Diseases Network: Accelerating Discovery about Health and Disease. , 2017, American journal of human genetics.

[51]  Annelot M. Dekker,et al.  Exome array analysis of rare and low frequency variants in amyotrophic lateral sclerosis , 2019, Scientific Reports.

[52]  Giuliano Binetti,et al.  A comprehensive study of the genetic impact of rare variants in SORL1 in European early-onset Alzheimer’s disease , 2016, Acta Neuropathologica.

[53]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[54]  Tanya M. Teslovich,et al.  Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls , 2019, Nature.

[55]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[56]  Diana C. Chong,et al.  Developmental SMAD6 loss leads to blood vessel hemorrhage and disrupted endothelial cell junctions. , 2018, Developmental biology.

[57]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[58]  L. Zhu,et al.  trackViewer: a Bioconductor package for interactive and integrative visualization of multi-omics data , 2019, Nature Methods.

[59]  Gonçalo Abecasis,et al.  Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank , 2019, bioRxiv.