Settling the score: variant prioritization and Mendelian disease

When investigating Mendelian disease using exome or genome sequencing, distinguishing disease-causing genetic variants from the multitude of candidate variants is a complex, multidimensional task. Many prioritization tools and online interpretation resources exist, and professional organizations have offered clinical guidelines for review and return of prioritization results. In this Review, we describe the strengths and weaknesses of widely used computational approaches, explain their roles in the diagnostic and discovery process and discuss how they can inform (and misinform) expert reviewers. We place variant prioritization in the wider context of gene prioritization, burden testing and genotype–phenotype association, and we discuss opportunities and challenges introduced by whole-genome sequencing.

[1]  David H. Dreyfus,et al.  CYSTIC FIBROSIS HETEROZYGOTE RESISTANCE TO CHOLERA TOXIN IN THE CYSTIC FIBROSIS MOUSE MODEL , 1995, Pediatrics.

[2]  P. Stenson,et al.  Human Gene Mutation Database—A biomedical information and research resource , 2000, Human mutation.

[3]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[4]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[5]  S. Yip Sequence variation at the human ABO locus , 2002, Annals of human genetics.

[6]  Leena Peltonen,et al.  Tibial muscular dystrophy is a titinopathy caused by mutations in TTN, the gene encoding the giant skeletal-muscle protein titin. , 2002, American journal of human genetics.

[7]  N. Levine,et al.  Ichthyosis bullosa of Siemens , 2004 .

[8]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[9]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[10]  K. Shokat,et al.  Human Catechol-O-Methyltransferase Haplotypes Modulate Protein Expression by Altering mRNA Secondary Structure , 2006, Science.

[11]  Lippincott-Schwartz,et al.  Supporting Online Material Materials and Methods Som Text Figs. S1 to S8 Table S1 Movies S1 to S3 a " Silent " Polymorphism in the Mdr1 Gene Changes Substrate Specificity Corrected 30 November 2007; See Last Page , 2022 .

[12]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[13]  Karen Eilbeck,et al.  Quantitative measures for the management and comparison of annotated genomes , 2009, BMC Bioinformatics.

[14]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[15]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[16]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[17]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[18]  Lee-Jen Wei,et al.  Pooled Association Tests for Rare Variants in Exon-Resequencing Studies , 2010 .

[19]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[20]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[21]  M. Corey,et al.  Do common in silico tools predict the clinical consequences of amino‐acid substitutions in the CFTR gene? , 2010, Clinical genetics.

[22]  Suzanne M. Leal,et al.  A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions , 2010, PLoS genetics.

[23]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[24]  Deanne M. Taylor,et al.  Powerful SNP-set analysis for case-control genome-wide association studies. , 2010, American journal of human genetics.

[25]  Steven L Salzberg,et al.  Between a chicken and a grape: estimating the number of human genes , 2010, Genome Biology.

[26]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[27]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[28]  Guy A. Rouleau,et al.  The case for locus-specific databases , 2011, Nature Reviews Genetics.

[29]  M. G. Reese,et al.  A probabilistic disease-gene finder for personal genomes. , 2011, Genome research.

[30]  Ross C Hardison,et al.  What fraction of the human genome is functional? , 2011, Genome research.

[31]  J. Shendure,et al.  Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data , 2011, Nature Reviews Genetics.

[32]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[33]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[34]  P. Hedrick Population genetics of malaria resistance in humans , 2011, Heredity.

[35]  V. Adhami,et al.  Keratin gene mutations in disorders of human skin and its appendages. , 2011, Archives of biochemistry and biophysics.

[36]  C. Ross,et al.  Huntington's disease: from molecular pathogenesis to clinical treatment , 2011, The Lancet Neurology.

[37]  F. Dhombres,et al.  Representation of rare diseases in health information systems: The orphanet approach to serve a wide range of end users , 2012, Human mutation.

[38]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[39]  M. Rieder,et al.  Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. , 2012, American journal of human genetics.

[40]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[41]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[42]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[43]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[44]  L. Mestroni,et al.  Truncations of titin causing dilated cardiomyopathy. , 2012, The New England journal of medicine.

[45]  Rong Chen,et al.  Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation. , 2012, American journal of human genetics.

[46]  Tom R. Gaunt,et al.  Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models , 2012, Human mutation.

[47]  Aaron R. Quinlan,et al.  GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations , 2013, PLoS Comput. Biol..

[48]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[49]  J. Vincent,et al.  A synonymous change, p.Gly16Gly in MECP2 Exon 1, causes a cryptic splice event in a Rett syndrome patient , 2013, Orphanet Journal of Rare Diseases.

[50]  Martin G Reese,et al.  Clinical analysis of genome next-generation sequencing data using the Omicia platform , 2013, Expert review of molecular diagnostics.

[51]  Bart De Moor,et al.  eXtasy: variant prioritization by genomic data fusion , 2013, Nature Methods.

[52]  S. Eddy,et al.  Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions , 2013, Nucleic acids research.

[53]  D. Valle,et al.  PhenoDB: A New Web-Based Tool for the Collection, Storage, and Analysis of Phenotypic Features , 2013, Human mutation.

[54]  Mark Yandell,et al.  VAAST 2.0: Improved Variant Classification and Disease-Gene Identification Using a Conservation-Controlled Amino Acid Substitution Matrix , 2013, Genetic epidemiology.

[55]  Chao Chen,et al.  dbVar and DGVa: public archives for genomic structural variation , 2012, Nucleic Acids Res..

[56]  Michael Brudno,et al.  PhenoTips: Patient Phenotyping Software for Clinical and Research Use , 2013, Human mutation.

[57]  I. Adzhubei,et al.  Predicting Functional Effect of Human Missense Mutations Using PolyPhen‐2 , 2013, Current protocols in human genetics.

[58]  A. Devereau,et al.  Practice Guidelines for the Evaluation of Pathogenicity and the Reporting of Sequence Variants in Clinical Molecular Genetics . , 2013 .

[59]  D. Goldstein,et al.  Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes , 2013, PLoS genetics.

[60]  Daniel Nilsson,et al.  An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge , 2014, Genome Biology.

[61]  Geert Vandeweyer,et al.  VariantDB: a flexible annotation and filtering portal for next generation sequencing data , 2014, Genome Medicine.

[62]  Mark Yandell,et al.  Using VAAST to Identify Disease‐Associated Variants in Next‐Generation Sequencing Data , 2014, Current protocols in human genetics.

[63]  Gabor T Marth,et al.  bam.iobio: a web-based, real-time, sequence alignment file inspector , 2014, Nature Methods.

[64]  D. G. MacArthur,et al.  Guidelines for investigating causality of sequence variants in human disease , 2014, Nature.

[65]  Stephan J Sanders,et al.  A framework for the interpretation of de novo mutation in human disease , 2014, Nature Genetics.

[66]  Bo Peng,et al.  Variant association tools for quality control and analysis of large-scale sequence and genotyping array data. , 2014, American journal of human genetics.

[67]  Brett J. Kennedy,et al.  Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. , 2014, American journal of human genetics.

[68]  G. Abecasis,et al.  Rare-variant association analysis: study designs and statistical tests. , 2014, American journal of human genetics.

[69]  Wyeth W Wasserman,et al.  FLAGS, frequently mutated genes in public exomes , 2014, BMC Medical Genomics.

[70]  V. Nigro,et al.  Genetic basis of limb-girdle muscular dystrophies: the 2014 update , 2014, Acta myologica : myopathies and cardiomyopathies : official journal of the Mediterranean Society of Myology.

[71]  J. Harrow,et al.  Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes , 2014, Human molecular genetics.

[72]  Damian Smedley,et al.  Improved exome prioritization of disease genes through cross-species phenotype comparison , 2014, Genome research.

[73]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[74]  Chava Kimchi-Sarfaty,et al.  Exposing synonymous mutations. , 2014, Trends in genetics : TIG.

[75]  M. Waldenberger,et al.  Compound heterozygosity of low-frequency promoter deletions and rare loss-of-function mutations in TXNL4A causes Burn-McKeown syndrome. , 2014, American journal of human genetics.

[76]  Gustavo Glusman,et al.  A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data , 2014, Nature Biotechnology.

[77]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[78]  Jana Marie Schwarz,et al.  MutationTaster2: mutation prediction for the deep-sequencing age , 2014, Nature Methods.

[79]  J. Lupski,et al.  TBX6 null variants and a common hypomorphic allele in congenital scoliosis. , 2015, The New England journal of medicine.

[80]  A. Siepel,et al.  Probabilities of Fitness Consequences for Point Mutations Across the Human Genome , 2014, Nature Genetics.

[81]  Karynne E. Patterson,et al.  The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. , 2015, American journal of human genetics.

[82]  Martin S. Taylor,et al.  Homozygous loss-of-function variants in European cosmopolitan and isolate populations , 2015, Human molecular genetics.

[83]  Elspeth A. Bruford,et al.  Genenames.org: the HGNC resources in 2015 , 2014, Nucleic Acids Res..

[84]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[85]  Laurie D. Smith,et al.  Whole-genome sequencing for identification of Mendelian disorders in critically ill infants: a retrospective analysis of diagnostic and clinical findings. , 2015, The Lancet. Respiratory medicine.

[86]  Hui Yang,et al.  Phenolyzer: phenotype-based prioritization of candidate genes for human diseases , 2015, Nature Methods.

[87]  Karen Eilbeck,et al.  Improving the Sequence Ontology terminology for genomic variant annotation , 2015, Journal of Biomedical Semantics.

[88]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[89]  G. Lettre,et al.  Rare variant association studies: considerations, challenges and opportunities , 2015, Genome Medicine.

[90]  Yongwook Choi,et al.  PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels , 2015, Bioinform..

[91]  Stephan J Sanders,et al.  Frequency and complexity of de novo structural mutation in autism , 2015, bioRxiv.

[92]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[93]  Damian Smedley,et al.  Next-generation diagnostics and disease-gene discovery with the Exomiser , 2015, Nature Protocols.

[94]  Saloni Agrawal,et al.  Phen-Gen: Combining Phenotype and Genotype to Analyze Rare Disorders , 2015 .

[95]  Richard Durbin,et al.  Extending reference assembly models , 2015, Genome Biology.

[96]  Peter N. Robinson,et al.  Phenotype-driven strategies for exome prioritization of human Mendelian disease genes , 2015, Genome Medicine.

[97]  Heidi L Rehm,et al.  ClinGen--the Clinical Genome Resource. , 2015, The New England journal of medicine.

[98]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[99]  Raymond Dalgleish,et al.  HGVS Recommendations for the Description of Sequence Variants: 2016 Update , 2016, Human mutation.

[100]  Rachel G Liao,et al.  A federated ecosystem for sharing genomic, clinical data , 2016, Science.

[101]  D. MacArthur,et al.  Publicly Available Data Provide Evidence against NR1H3 R415Q Causing Multiple Sclerosis , 2016, Neuron.

[102]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[103]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[104]  Euan A. Ashley,et al.  Long-read whole genome sequencing identifies causal structural variation in a Mendelian disease , 2016, bioRxiv.

[105]  Trevor Hastie,et al.  REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. , 2016, American journal of human genetics.

[106]  Matthew N. Bainbridge,et al.  A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics , 2016, Genome Medicine.

[107]  Giorgio Valentini,et al.  A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease. , 2016, American journal of human genetics.

[108]  Peter N. Robinson,et al.  Alternate-locus aware variant calling in whole genome sequencing , 2016, Genome Medicine.

[109]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[110]  N. Shah,et al.  Identification of misclassified ClinVar variants using disease population prevalence , 2016, bioRxiv.

[111]  Michael J Ackerman,et al.  The Promise and Peril of Precision Medicine: Phenotyping Still Matters Most. , 2016, Mayo Clinic proceedings.

[112]  Annie Niehaus,et al.  Using ClinVar as a Resource to Support Variant Interpretation , 2016, Current protocols in human genetics.

[113]  Daniel G. MacArthur,et al.  Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity , 2017, Nature.

[114]  Edwin Cuppen,et al.  The genomic landscape of balanced cytogenetic abnormalities associated with human congenital anomalies , 2016, Nature Genetics.

[115]  Jason Li,et al.  PathOS: a decision support system for reporting high throughput sequencing of cancers in clinical diagnostic laboratories , 2017, Genome Medicine.

[116]  A. Siepel,et al.  Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data , 2016, Nature Genetics.

[117]  Alessandro Vullo,et al.  Ensembl 2017 , 2016, Nucleic Acids Res..

[118]  Tudor Groza,et al.  The Human Phenotype Ontology in 2017 , 2016, Nucleic Acids Res..

[119]  Amalio Telenti,et al.  Identification of misclassified ClinVar variants using disease population prevalence , 2016, bioRxiv.

[120]  Tam P. Sneddon,et al.  Long-read genome sequencing identifies causal structural variation in a Mendelian disease , 2017, Genetics in Medicine.