VarCards: an integrated genetic and clinical database for coding variants in the human genome

Abstract A growing number of genomic tools and databases were developed to facilitate the interpretation of genomic variants, particularly in coding regions. However, these tools are separately available in different online websites or databases, making it challenging for general clinicians, geneticists and biologists to obtain the first-hand information regarding some particular variants and genes of interest. Starting with coding regions and splice sties, we artificially generated all possible single nucleotide variants (n = 110 154 363) and cataloged all reported insertion and deletions (n = 1 223 370). We then annotated these variants with respect to functional consequences from more than 60 genomic data sources to develop a database, named VarCards (http://varcards.biols.ac.cn/), by which users can conveniently search, browse and annotate the variant- and gene-level implications of given variants, including the following information: (i) functional effects; (ii) functional consequences through different in silico algorithms; (iii) allele frequencies in different populations; (iv) disease- and phenotype-related knowledge; (v) general meaningful gene-level information; and (vi) drug–gene interactions. As a case study, we successfully employed VarCards in interpretation of de novo mutations in autism spectrum disorders. In conclusion, VarCards provides an intuitive interface of necessary information for researchers to prioritize candidate variations and genes.

[1]  Ayal B. Gussow,et al.  The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity , 2015, PLoS genetics.

[2]  Brad T. Sherman,et al.  DAVID-WS: a stateful web service to facilitate gene/protein list analysis , 2012, Bioinform..

[3]  S. Brunak,et al.  A scored human protein–protein interaction network to catalyze genomic interpretation , 2017, Nature Methods.

[4]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[5]  E. Boerwinkle,et al.  dbNSFP v3.0: A One‐Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice‐Site SNVs , 2016, Human mutation.

[6]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.

[7]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[8]  Sarah C. Ayling,et al.  The Ensembl gene annotation system , 2016, Database J. Biol. Databases Curation.

[9]  Xiaohui Xie,et al.  Identifying novel constrained elements by exploiting biased substitution patterns , 2009, Bioinform..

[10]  Stefan Mundlos,et al.  Looking beyond the genes: the role of non-coding variants in human disease. , 2016, Human molecular genetics.

[11]  Kei-Hoi Cheung,et al.  A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data , 2015, Scientific Reports.

[12]  Tom R. Gaunt,et al.  Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models , 2012, Human mutation.

[13]  Stephan J Sanders,et al.  Refining the role of de novo protein truncating variants in neurodevelopmental disorders using population reference samples , 2016, Nature Genetics.

[14]  Xianfeng Li,et al.  RBP-Var: a database of functional variants involved in regulation mediated by RNA-binding proteins , 2015, Nucleic Acids Res..

[15]  Boris Yamrom,et al.  The contribution of de novo coding mutations to autism spectrum disorder , 2014, Nature.

[16]  Zhong Sheng Sun,et al.  Genes with de novo mutations are shared by four neuropsychiatric disorders discovered from NPdenovo database , 2016, Molecular Psychiatry.

[17]  Tudor Groza,et al.  The Human Phenotype Ontology in 2017 , 2016, Nucleic Acids Res..

[18]  Mingming Jia,et al.  COSMIC: somatic cancer genetics at high-resolution , 2016, Nucleic Acids Res..

[19]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[20]  Jared C. Roach,et al.  Kaviar: an accessible system for testing SNV novelty , 2011, Bioinform..

[21]  D. Karolchik,et al.  The UCSC Genome Browser database: 2016 update , 2015, bioRxiv.

[22]  Jana Marie Schwarz,et al.  MutationTaster evaluates disease-causing potential of sequence alterations , 2010, Nature Methods.

[23]  E. Boerwinkle,et al.  dbNSFP v2.0: A Database of Human Non‐synonymous SNVs and Their Functional Predictions and Annotations , 2013, Human mutation.

[24]  Benjamin F. Voight,et al.  Nature Genetics Advance Online Publication a N a Ly S I S an Expanded Sequence Context Model Broadly Explains Variability in Polymorphism Levels across the Human Genome , 2022 .

[25]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[26]  P. Ng,et al.  SIFT missense predictions for genomes , 2015, Nature Protocols.

[27]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[28]  J. Miller,et al.  Predicting the Functional Effect of Amino Acid Substitutions and Indels , 2012, PloS one.

[29]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[30]  E. Eichler,et al.  Shotgun sequence assembly and recent segmental duplications within the human genome , 2004, Nature.

[31]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[32]  Justin C. Fay,et al.  Identification of deleterious mutations within three human genomes. , 2009, Genome research.

[33]  R. Gibbs,et al.  Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. , 2015, Human molecular genetics.

[34]  Raphael A. Bernier,et al.  denovo-db: a compendium of human de novo variants , 2016, Nucleic Acids Res..

[35]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[36]  Eric Boerwinkle,et al.  In silico tools for splicing defect prediction - A survey from the viewpoint of end-users , 2013, Genetics in Medicine.

[37]  Trevor Hastie,et al.  REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. , 2016, American journal of human genetics.

[38]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[39]  Zhong Sheng Sun,et al.  Targeted sequencing and functional analysis reveal brain-size-related genes and their networks in autism spectrum disorders , 2017, Molecular Psychiatry.

[40]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[41]  Leslie G Biesecker,et al.  Diagnostic clinical genome and exome sequencing. , 2014, The New England journal of medicine.

[42]  Tao Liu,et al.  TreeFam: 2008 Update , 2007, Nucleic Acids Res..

[43]  B. Cohen,et al.  High-throughput functional testing of ENCODE segmentation predictions , 2014, Genome research.

[44]  Hui Yang,et al.  Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR , 2015, Nature Protocols.

[45]  Kai Wang,et al.  SeqMule: automated pipeline for analysis of human exome/genome sequencing data , 2015, Scientific Reports.

[46]  A. Siepel,et al.  Probabilities of Fitness Consequences for Point Mutations Across the Human Genome , 2014, Nature Genetics.

[47]  Thomas L Casavant,et al.  Utilizing ethnic-specific differences in minor allele frequency to recategorize reported pathogenic deafness variants. , 2014, American journal of human genetics.

[48]  Quan Li,et al.  InterVar: Clinical Interpretation of Genetic Variants by the 2015 ACMG-AMP Guidelines. , 2017, American journal of human genetics.

[49]  J Licinio,et al.  Serotonergic neurons derived from induced pluripotent stem cells (iPSCs): a new pathway for research on the biology and pharmacology of major depression , 2016, Molecular Psychiatry.

[50]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[51]  Deng Pan,et al.  DGIdb 2.0: mining clinically relevant drug–gene interactions , 2015, Nucleic Acids Res..

[52]  Richa Gupta,et al.  Division of labor among Mycobacterium smegmatis RNase H enzymes: RNase H1 activity of RnhA or RnhC is essential for growth whereas RnhB and RnhA guard against killing by hydrogen peroxide in stationary phase , 2016, Nucleic acids research.

[53]  Daniel G. MacArthur,et al.  The ExAC browser: displaying reference data information from over 60 000 exomes , 2016, bioRxiv.

[54]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[55]  Jessica C. Ebert,et al.  Computational Techniques for Human Genome Resequencing Using Mated Gapped Reads , 2012, J. Comput. Biol..

[56]  Gill Bejerano,et al.  M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity , 2016, Nature Genetics.

[57]  Tao Wang,et al.  mirTrios: an integrated pipeline for detection of de novo and rare inherited mutations from trios-based next-generation sequencing , 2015, Journal of Medical Genetics.

[58]  Tatiana A. Tatusova,et al.  Gene: a gene-centered information resource at NCBI , 2014, Nucleic Acids Res..

[59]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[60]  David Haussler,et al.  The UCSC Genome Browser database: 2017 update , 2016, Nucleic Acids Res..

[61]  P. Stenson,et al.  The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies , 2017, Human Genetics.

[62]  Alan M. Kwong,et al.  A reference panel of 64,976 haplotypes for genotype imputation , 2015, Nature Genetics.

[63]  Hui Yang,et al.  Phenolyzer: phenotype-based prioritization of candidate genes for human diseases , 2015, Nature Methods.

[64]  Alessandro Vullo,et al.  Ensembl 2017 , 2016, Nucleic Acids Res..

[65]  Jinchen Li,et al.  EpilepsyGene: a genetic resource for genes and mutations related to epilepsy , 2014, Nucleic Acids Res..

[66]  Colin Campbell,et al.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation , 2015, Bioinform..

[67]  Andrew J. Hill,et al.  Analysis of protein-coding genetic variation in 60,706 humans , 2015, bioRxiv.

[68]  Renata C. Geer,et al.  The NCBI BioSystems database , 2009, Nucleic Acids Res..

[69]  Leif Groop,et al.  LoFtool: a gene intolerance score based on loss‐of‐function variants in 60 706 individuals , 2016, Bioinform..

[70]  David Haussler,et al.  Current status and new features of the Consensus Coding Sequence database , 2013, Nucleic Acids Res..

[71]  Obi L. Griffith,et al.  High-performance web services for querying gene and variant annotation , 2016, Genome Biology.

[72]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[73]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[74]  Mustafa Tekin,et al.  The promise of whole-exome sequencing in medical genetics , 2013, Journal of Human Genetics.

[75]  H. Carter,et al.  Identifying Mendelian disease genes with the Variant Effect Scoring Tool , 2013, BMC Genomics.

[76]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[77]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[78]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[79]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[80]  C. Lord,et al.  The Simons Simplex Collection: A Resource for Identification of Autism Genetic Risk Factors , 2010, Neuron.

[81]  Martin Ringwald,et al.  Mouse Genome Informatics (MGI): Resources for Mining Mouse Genetic, Genomic, and Biological Data in Support of Primary and Translational Research. , 2017, Methods in molecular biology.

[82]  David Haussler,et al.  New Methods for Detecting Lineage-Specific Selection , 2006, RECOMB.

[83]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..

[84]  Zhong Sheng Sun,et al.  Vitamin D‐related genes are subjected to significant de novo mutation burdens in autism spectrum disorder , 2017, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[85]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[86]  Gert Matthijs,et al.  Guidelines for diagnostic next-generation sequencing , 2015, European Journal of Human Genetics.

[87]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[88]  Andrew Carroll,et al.  WGSA: an annotation pipeline for human genome sequencing studies , 2015, Journal of Medical Genetics.

[89]  Kai Wang,et al.  wANNOVAR: annotating genetic variants for personal genomes via the web , 2012, Journal of Medical Genetics.

[90]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[91]  D. G. MacArthur,et al.  Guidelines for investigating causality of sequence variants in human disease , 2014, Nature.

[92]  Sharmila Banerjee-Basu,et al.  SFARI Gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs) , 2013, Molecular Autism.

[93]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[94]  E. Boerwinkle,et al.  dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions , 2011, Human mutation.

[95]  Kai Wang,et al.  Identifying disease mutations in genomic medicine settings: current challenges and how to accelerate progress , 2012, Genome Medicine.

[96]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[97]  Christopher S. Poultney,et al.  Synaptic, transcriptional, and chromatin genes disrupted in autism , 2014, Nature.

[98]  Silvio C. E. Tosatto,et al.  InterPro in 2017—beyond protein family and domain annotations , 2016, Nucleic Acids Res..

[99]  G. von Heijne,et al.  Tissue-based map of the human proteome , 2015, Science.