Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts

BackgroundData from a plethora of high-throughput sequencing studies is readily available to researchers, providing genetic variants detected in a variety of healthy and disease populations. While each individual cohort helps gain insights into polymorphic and disease-associated variants, a joint perspective can be more powerful in identifying polymorphisms, rare variants, disease-associations, genetic burden, somatic variants, and disease mechanisms.DescriptionWe have set up a Reference Variant Store (RVS) containing variants observed in a number of large-scale sequencing efforts, such as 1000 Genomes, ExAC, Scripps Wellderly, UK10K; various genotyping studies; and disease association databases. RVS holds extensive annotations pertaining to affected genes, functional impacts, disease associations, and population frequencies. RVS currently stores 400 million distinct variants observed in more than 80,000 human samples.ConclusionsRVS facilitates cross-study analysis to discover novel genetic risk factors, gene–disease associations, potential disease mechanisms, and actionable variants. Due to its large reference populations, RVS can also be employed for variant filtration and gene prioritization.AvailabilityA web interface to public datasets and annotations in RVS is available at https://rvs.u.hpc.mssm.edu/.

[1]  Steven M. Gallo,et al.  REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila , 2010, Nucleic Acids Res..

[2]  Eivind Hovig,et al.  Performance comparison of four exome capture systems for deep sequencing , 2014, BMC Genomics.

[3]  Daniel R. Zerbino,et al.  Ensembl 2014 , 2013, Nucleic Acids Res..

[4]  Melissa J. Landrum,et al.  RefSeq: an update on mammalian reference sequences , 2013, Nucleic Acids Res..

[5]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[6]  Daniel Rios,et al.  Bioinformatics Applications Note Databases and Ontologies Deriving the Consequences of Genomic Variants with the Ensembl Api and Snp Effect Predictor , 2022 .

[7]  P. Stenson,et al.  The Human Gene Mutation Database (HGMD) and Its Exploitation in the Fields of Personalized Genomics and Molecular Evolution , 2012, Current protocols in bioinformatics.

[8]  Chitta Baral,et al.  A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions , 2012, J. Biomed. Informatics.

[9]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[10]  Lucila Ohno-Machado,et al.  Translational bioinformatics: linking knowledge across biological and clinical realms , 2011, J. Am. Medical Informatics Assoc..

[11]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[12]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[13]  Gunnar Houge,et al.  Monoallelic and biallelic mutations in MAB21L2 cause a spectrum of major eye malformations. , 2014, American journal of human genetics.

[14]  Tatiana A. Tatusova,et al.  Gene: a gene-centered information resource at NCBI , 2014, Nucleic Acids Res..

[15]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[16]  Neha Deshpande,et al.  SG-ADVISER CNV: copy-number variant annotation and interpretation , 2014, Genetics in Medicine.

[17]  E. Boerwinkle,et al.  dbNSFP v2.0: A Database of Human Non‐synonymous SNVs and Their Functional Predictions and Annotations , 2013, Human mutation.

[18]  Sean D Mooney,et al.  Bioinformatic tools for identifying disease gene and SNP candidates. , 2010, Methods in molecular biology.

[19]  Elizabeth T. Cirulli,et al.  The Characterization of Twenty Sequenced Human Genomes , 2010, PLoS genetics.

[20]  Russ B Altman,et al.  PharmGKB: the Pharmacogenomics Knowledge Base. , 2013, Methods in molecular biology.

[21]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[22]  Russ B Altman,et al.  PharmGKB: the pharmacogenetics and pharmacogenomics knowledge base. , 2005, Methods in molecular biology.

[23]  Birgit Lorenz,et al.  Expansion of ocular phenotypic features associated with mutations in ADAMTS18. , 2014, JAMA ophthalmology.

[24]  Chao Chen,et al.  dbVar and DGVa: public archives for genomic structural variation , 2012, Nucleic Acids Res..

[25]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[26]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[27]  Francesco Muntoni,et al.  Managing clinically significant findings in research: the UK10K example , 2014, European Journal of Human Genetics.

[28]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[29]  Karin M. Verspoor,et al.  Open Peer Review Invited Referee Responses , 2022 .

[30]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[31]  Simon Cawley,et al.  Next generation genome-wide association tool: design and coverage of a high-throughput European-optimized SNP array. , 2011, Genomics.

[32]  Goran Nenadic,et al.  The GNAT library for local and remote gene mention normalization , 2011, Bioinform..

[33]  Aaron R. Quinlan,et al.  GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations , 2013, PLoS Comput. Biol..

[34]  D. G. MacArthur,et al.  Guidelines for investigating causality of sequence variants in human disease , 2014, Nature.

[35]  V. McKusick Mendelian inheritance in man , 1971 .

[36]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[37]  Huaiyu Mi,et al.  The InterPro protein families database: the classification resource after 15 years , 2014, Nucleic Acids Res..

[38]  Ulf Gyllensten,et al.  CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects , 2014, Database J. Biol. Databases Curation.

[39]  Doron Betel,et al.  The microRNA.org resource: targets and expression , 2007, Nucleic Acids Res..

[40]  Yongwook Choi,et al.  PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels , 2015, Bioinform..

[41]  Vincent A. Fusaro,et al.  A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature , 2014, Bioinform..

[42]  T. Hampton,et al.  The Cancer Genome Atlas , 2020, Indian Journal of Medical and Paediatric Oncology.

[43]  Euan A Ashley,et al.  Clinical interpretation and implications of whole-genome sequencing. , 2014, JAMA.

[44]  S. Antonarakis,et al.  Corrigendum: Mutation nomenclature extensions and suggestions to describe complex mutations: A discussion , 2002, Human mutation.

[45]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[46]  Jean-Baptiste Cazier,et al.  Choice of transcripts and software has a large effect on variant annotation , 2014, Genome Medicine.

[47]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[48]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[49]  C. Jack,et al.  Ways toward an early diagnosis in Alzheimer’s disease: The Alzheimer’s Disease Neuroimaging Initiative (ADNI) , 2005, Alzheimer's & Dementia.

[50]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[51]  I. Adzhubei,et al.  Predicting Functional Effect of Human Missense Mutations Using PolyPhen‐2 , 2013, Current protocols in human genetics.