A practical guide to filtering and prioritizing genetic variants.

Next-generation sequencing (NGS) of whole genomes and exomes is a powerful tool in biomedical research and clinical diagnostics. However, the vast amount of data produced by NGS introduces new challenges and opportunities, many of which require novel computational and theoretical approaches when it comes to identifying the causal variant(s) for a disease of interest. While workflows and associated software to process raw data and produce high-confidence variant calls have significantly improved, filtering tens of thousands of candidates to identify a subset relevant to a specific study is still a complex exercise best left to bioinformaticists. However, as this prioritization procedure requires biological/biomedical reasoning, biologists and clinicians are increasingly motivated to handle the task themselves. Here, we describe a set of guidelines, tools, and online resources that can be used to identify functional variants from whole-genome and whole-exome variant calls and then prioritize these variants with potential associations to phenotypes of interest. Insights gained from a recently published analysis of protein-coding gene variation in >60,000 humans by the Exome Aggregation Consortium (ExAC) are also taken into account.

[1]  I. Tikhonova,et al.  Genetic diagnosis by whole exome capture and massively parallel DNA sequencing , 2009, Proceedings of the National Academy of Sciences.

[2]  Jean-Michel Claverie,et al.  The human gene damage index as a gene-level approach to prioritizing exome variants , 2015, Proceedings of the National Academy of Sciences.

[3]  E. Boerwinkle,et al.  dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions , 2011, Human mutation.

[4]  Núria Queralt-Rosinach,et al.  DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes , 2015, Database J. Biol. Databases Curation.

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[7]  Michael Krawczak,et al.  Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease , 2013, Human Genetics.

[8]  Huaiyu Mi,et al.  The InterPro protein families database: the classification resource after 15 years , 2014, Nucleic Acids Res..

[9]  Colin Campbell,et al.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation , 2015, Bioinform..

[10]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[11]  Matthew N. Bainbridge,et al.  A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics , 2016, Genome Medicine.

[12]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[13]  T. Andrews,et al.  Comparison of predicted and actual consequences of missense mutations , 2015, Proceedings of the National Academy of Sciences.

[14]  Johnny S. H. Kwan,et al.  A comprehensive framework for prioritizing variants in exome sequencing studies of Mendelian diseases , 2012, Nucleic acids research.

[15]  Hui Yang,et al.  Phenolyzer: phenotype-based prioritization of candidate genes for human diseases , 2015, Nature Methods.

[16]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[17]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[18]  P. Ng,et al.  SIFT Indel: Predictions for the Functional Effects of Amino Acid Insertions/Deletions in Proteins , 2013, PloS one.

[19]  Peter N. Robinson,et al.  The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease , 2015, American journal of human genetics.

[20]  Michele Tinti,et al.  VirusMINT: a viral protein interaction database , 2008, Nucleic Acids Res..

[21]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[22]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[23]  Jana Marie Schwarz,et al.  GeneDistiller—Distilling Candidate Genes from Linkage Intervals , 2008, PloS one.

[24]  Aaron R. Quinlan,et al.  GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations , 2013, PLoS Comput. Biol..

[25]  Melinda R. Dwinell,et al.  The pathway ontology – updates and applications , 2014, Journal of Biomedical Semantics.

[26]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[27]  Damian Smedley,et al.  Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency , 2015, Genetics in Medicine.

[28]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[29]  Jean-Baptiste Cazier,et al.  Choice of transcripts and software has a large effect on variant annotation , 2014, Genome Medicine.

[30]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[31]  Weiliang Qiu,et al.  Combining effects from rare and common genetic variants in an exome-wide association study of sequence data , 2011, BMC proceedings.

[32]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[33]  C. Morrison,et al.  MAC: identifying and correcting annotation for multi-nucleotide variations , 2015, BMC Genomics.

[34]  Mark Yandell,et al.  VAAST 2.0: Improved Variant Classification and Disease-Gene Identification Using a Conservation-Controlled Amino Acid Substitution Matrix , 2013, Genetic epidemiology.

[35]  C. Burge,et al.  Prediction of Mammalian MicroRNA Targets , 2003, Cell.

[36]  Eurie L. Hong,et al.  Annotation of functional variation in personal genomes using RegulomeDB , 2012, Genome research.

[37]  Tudor Groza,et al.  Navigating the Phenotype Frontier: The Monarch Initiative , 2016, Genetics.

[38]  Emily H Turner,et al.  Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome , 2010, Nature Genetics.

[39]  Avi Ma'ayan,et al.  Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool , 2013, BMC Bioinformatics.

[40]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[41]  Avi Ma'ayan,et al.  KEA: kinase enrichment analysis , 2009, Bioinform..

[42]  Alexander R. Pico,et al.  WikiPathways: Pathway Editing for the People , 2008, PLoS biology.

[43]  David J. Arenillas,et al.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles , 2009, Nucleic Acids Res..

[44]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[45]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[46]  S. Sunyaev,et al.  Identification of cis-suppression of human disease mutations by comparative genomics , 2015, Nature.

[47]  Aedín C. Culhane,et al.  GeneSigDB—a curated database of gene expression signatures , 2009, Nucleic Acids Res..

[48]  Stephan J Sanders,et al.  Whole exome sequencing identifies recessive WDR62 mutations in severe brain malformations , 2010, Nature.

[49]  Gary D. Bader,et al.  GeneMANIA Prediction Server 2013 Update , 2013, Nucleic Acids Res..

[50]  Jamie K Teer,et al.  Massively parallel sequencing of exons on the X chromosome identifies RBM10 as the gene that causes a syndromic form of cleft palate. , 2010, American journal of human genetics.

[51]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[52]  Claudio J. Verzilli,et al.  An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People , 2012, Science.

[53]  Balaji S. Srinivasan,et al.  An empirical estimate of carrier frequencies for 400+ causal Mendelian variants: results from an ethnically diverse clinical sample of 23,453 individuals , 2012, Genetics in Medicine.

[54]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database's 10th year anniversary: update 2015 , 2014, Nucleic Acids Res..

[55]  J. Park,et al.  Functional intronic ERCC1 polymorphism from regulomeDB can predict survival in lung cancer after surgery , 2015, Oncotarget.

[56]  S. Ellard,et al.  Using SIFT and PolyPhen to predict loss-of-function and gain-of-function mutations. , 2010, Genetic testing and molecular biomarkers.

[57]  D. G. MacArthur,et al.  Guidelines for investigating causality of sequence variants in human disease , 2014, Nature.

[58]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[59]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[60]  Yaniv Erlich,et al.  Exome sequencing and disease-network analysis of a single family implicate a mutation in KIF1A in hereditary spastic paraparesis. , 2011, Genome research.

[61]  R. Gibbs,et al.  Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. , 2015, Human molecular genetics.

[62]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[63]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[64]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[65]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..