Testing for association with rare variants in the coding and non-coding genome: RAVA-FIRST, a new approach based on CADD deleteriousness score

Rare variant association tests (RVAT) have been developed to study the contribution of rare variants widely accessible through high-throughput sequencing technologies. RVAT require to aggregate rare variants in testing units and to filter variants to retain only the most likely causal ones. In the exome, genes are natural testing units and variants are usually filtered based on their functional consequences. However, when dealing with whole-genome sequence (WGS) data, both steps are challenging. No natural biological unit is available for aggregating rare variants. Sliding windows procedures have been proposed to circumvent this difficulty, however they are blind to biological information and result in a large number of tests. We propose a new strategy to perform RVAT on WGS data: “RAVA-FIRST” (RAre Variant Association using Functionally-InfoRmed STeps) comprising three steps. (1) New testing units are defined genome-wide based on functionally-adjusted Combined Annotation Dependent Depletion (CADD) scores of variants observed in the GnomAD populations, which are referred to as “CADD regions”. (2) A region-dependent filtering of rare variants is applied in each CADD region. (3) A functionally-informed burden test is performed with sub-scores computed for each genomic category within each CADD region. Both on simulations and real data, RAVA-FIRST was found to outperform other WGS-based RVAT. Applied to a WGS dataset of venous thromboembolism patients, we identified an intergenic region on chromosome 18 that is enriched for rare variants in early-onset patients and that was that was missed by standard sliding windows procedures. RAVA-FIRST enables new investigations of rare non-coding variants in complex diseases, facilitated by its implementation in the R package Ravages. Author Summary Technological progresses have made possible whole genome sequencing at an unprecedented scale, opening up the possibility to explore the role of genetic variants of low frequency in common diseases. The challenge is now methodological and requires the development of novel methods and strategies to analyse sequencing data that are not limited to assessing the role of coding variants. With RAVA-FIRST, we propose a novel strategy to investigate the role of rare variants in the whole-genome that takes benefit from biological information. Especially, RAVA-FIRST relies on testing units that go beyond genes to gather rare variants in the association tests. In this work, we show that this new strategy presents several advantages compared to existing methods. RAVA-FIRST offers an easy and straightforward analysis of genome-wide rare variants, especially the intergenic ones which are frequently left behind, making it a promising tool to get a better understanding of the biology of complex diseases.

[1]  G. Marenne,et al.  RAVAQ: An integrative pipeline from quality control to region‐based rare variant association analysis , 2022, Genetic epidemiology.

[2]  Boquan Jin,et al.  CD226 deficiency promotes glutaminolysis and alleviates mitochondria damage in vascular endothelial cells under hemorrhagic shock , 2021, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[3]  Sri V. V. Deevi,et al.  Rare variant contribution to human disease in 281,104 UK Biobank exomes , 2021, Nature.

[4]  Ayal B. Gussow,et al.  Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning , 2021, Nature Communications.

[5]  G. Marenne,et al.  Extension of SKAT to multi-category phenotypes through a geometrical interpretation , 2021, European Journal of Human Genetics.

[6]  D. Trégouët,et al.  An artificial neural network approach integrating plasma proteomics and genetic data identifies PLXNA4 as a new susceptibility locus for pulmonary embolism , 2020, Scientific Reports.

[7]  Misbah Razzaq,et al.  Explainable Artificial Neural Network for Recurrent Venous Thromboembolism Based on Plasma Proteomics , 2021, CMSB.

[8]  Michael J. Purcaro,et al.  Expanded encyclopaedias of DNA elements in the human and mouse genomes , 2020, Nature.

[9]  E. Génin,et al.  Rare variant association testing in the non-coding genome , 2020, Human Genetics.

[10]  William J. Astle,et al.  The Polygenic and Monogenic Basis of Blood Traits and Diseases , 2020, Cell.

[11]  Elizabeth T. Cirulli,et al.  Genome-wide rare variant analysis for thousands of phenotypes in over 70,000 exomes from two cohorts , 2020, Nature Communications.

[12]  William J. Astle,et al.  Trans-ethnic and Ancestry-Specific Blood-Cell Genetics in 746,667 Individuals from 5 Global Populations , 2020, Cell.

[13]  Ryan L. Collins,et al.  The mutational constraint spectrum quantified from variation in 141,456 humans , 2020, Nature.

[14]  Sebastian M. Armasu,et al.  Genomic and Transcriptomic Association Studies Identify 16 Novel Susceptibility Loci for Venous Thromboembolism. , 2019, Blood.

[15]  Martin S. Taylor,et al.  Increased ultra-rare variant load in an isolated Scottish population impacts exonic and regulatory regions , 2019, bioRxiv.

[16]  Alzheimer's Disease Neuroimaging Initiative,et al.  Non-Coding and Loss-of-Function Coding Variants in TET2 are Associated with Multiple Neurodegenerative Diseases , 2019, bioRxiv.

[17]  G. Marenne,et al.  Rare variant association testing for multicategory phenotype , 2019, Genetic epidemiology.

[18]  Iuliana Ionita-Laza,et al.  A genome-wide scan statistic framework for whole-genome sequence data analysis , 2019, Nature Communications.

[19]  K. Christensen,et al.  Association of low‐frequency genetic variants in regulatory regions with nonsyndromic orofacial clefts , 2018, American journal of medical genetics. Part A.

[20]  Xihong Lin,et al.  Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole Genome Sequencing Studies , 2019, bioRxiv.

[21]  T. Ganz,et al.  Anemia of inflammation. , 2019, Blood.

[22]  Gregory M. Cooper,et al.  CADD: predicting the deleteriousness of variants throughout the human genome , 2018, Nucleic Acids Res..

[23]  Lisa J. Strug,et al.  VikNGS: a C++ variant integration kit for next generation sequencing association analysis , 2018, bioRxiv.

[24]  M. Fornage,et al.  Whole exome sequencing study identifies novel rare and common Alzheimer’s-Associated variants involved in immune response and transcriptional regulation , 2018, Molecular Psychiatry.

[25]  Stephane E. Castel,et al.  Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk , 2018, Nature Genetics.

[26]  W. Wasserman,et al.  Genome-wide prediction of cis-regulatory regions using supervised deep learning methods , 2016, BMC Bioinformatics.

[27]  Chunlei Liu,et al.  ClinVar: improving access to variant interpretations and supporting evidence , 2017, Nucleic Acids Res..

[28]  Carol J. Bult,et al.  The Mouseion at the JAXlibrary , 2022 .

[29]  Brent S. Pedersen,et al.  A map of constrained coding regions in the human genome , 2017, Nature Genetics.

[30]  R. Redon,et al.  Contribution to Alzheimer's disease risk of rare variants in TREM2, SORL1, and ABCA7 in 1779 cases and 1273 controls , 2017, Neurobiology of Aging.

[31]  William H. Majoros,et al.  Orion: Detecting regions of the human non-coding genome that are intolerant to variation using population genetics , 2017, PloS one.

[32]  Sylvia Richardson,et al.  A Fast Association Test for Identifying Pathogenic Variants Involved in Rare Diseases , 2017, American journal of human genetics.

[33]  E. Génin,et al.  Rare RNF213 variants in the C-terminal region encompassing the RING-finger domain are associated with moyamoya angiopathy in Caucasians , 2017, European Journal of Human Genetics.

[34]  Navin Rustagi,et al.  Practical Approaches for Whole-Genome Sequence Analysis of Heart- and Blood-Related Traits. , 2017, American journal of human genetics.

[35]  A. Siepel,et al.  Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data , 2016, Nature Genetics.

[36]  A. Boyle,et al.  Mining the Unknown: Assigning Function to Noncoding Single Nucleotide Polymorphisms. , 2017, Trends in genetics : TIG.

[37]  T. Papo,et al.  First venous thromboembolism in admitted patients younger than 50years old. , 2016, European journal of internal medicine.

[38]  Xiaowei Zhan,et al.  RVTESTS: an efficient and comprehensive tool for rare variant association analysis using sequence data , 2016, Bioinform..

[39]  Lluis Quintana-Murci,et al.  The mutation significance cutoff: gene-level thresholds for variant predictions , 2016, Nature Methods.

[40]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[41]  Tom R. Gaunt,et al.  The UK10K project identifies rare variants in health and disease , 2016 .

[42]  J. Lupski,et al.  Non-coding genetic variants in human disease. , 2015, Human molecular genetics.

[43]  T. Ganz,et al.  Anemia of inflammation. , 2014, Hematology/oncology clinics of North America.

[44]  G. Abecasis,et al.  Rare-variant association analysis: study designs and statistical tests. , 2014, American journal of human genetics.

[45]  A. Schechter,et al.  JAK-STAT and AKT pathway-coupled genes in erythroid progenitor cells through ontogeny , 2012, Journal of Translational Medicine.

[46]  Sue Fletcher,et al.  Regulation of eukaryotic gene expression by the untranslated gene regions and other non-coding elements , 2012, Cellular and Molecular Life Sciences.

[47]  David V Conti,et al.  Incorporating model uncertainty in detecting rare variants: the Bayesian risk index , 2011, Genetic epidemiology.

[48]  F. Cambien,et al.  Genetics of Venous Thrombosis: Insights from a New Genome Wide Association Study , 2011, PloS one.

[49]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[50]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[51]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[52]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[53]  J. Li,et al.  Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix , 2005, Heredity.

[54]  P. Burger,et al.  Platelets in Inflammation and Thrombosis , 2003, Arteriosclerosis, thrombosis, and vascular biology.

[55]  W. Jia,et al.  The expression, regulation and adhesion function of a novel CD molecule, CD226, on human endothelial cells. , 2003, Life sciences.

[56]  H. Nakauchi,et al.  CD226 Mediates Platelet and Megakaryocytic Cell Adhesion to Vascular Endothelial Cells* , 2003, Journal of Biological Chemistry.

[57]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.