Improved exome prioritization of disease genes through cross-species phenotype comparison

Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic. The observation that each of our genomes contains about 100 genuine loss-of-function variants makes identification of the causative mutation problematic when using these strategies alone. We propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity, and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1-fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of >95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.

[1]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[2]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[3]  Damian Smedley,et al.  Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research , 2013, F1000Research.

[4]  Cynthia L. Smith,et al.  Integrating phenotype ontologies across multiple species , 2010, Genome Biology.

[5]  B. V. van Bon,et al.  Diagnostic exome sequencing in persons with severe intellectual disability. , 2012, The New England journal of medicine.

[6]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[7]  Gabor T. Marth,et al.  Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics , 2013, Science.

[8]  Sebastian Bauer,et al.  Identity-by-descent filtering of exome sequence data for disease–gene identification in autosomal recessive disorders , 2011, Bioinform..

[9]  Christian Gilissen,et al.  A de novo paradigm for mental retardation , 2010, Nature Genetics.

[10]  Judith A. Blake,et al.  The Mouse Genome Database: Genotypes, Phenotypes, and Models of Human Disease , 2012, Nucleic Acids Res..

[11]  Yana Bromberg,et al.  Chapter 15: Disease Gene Prioritization , 2013, PLoS Comput. Biol..

[12]  P. Stenson,et al.  Human Gene Mutation Database: towards a comprehensive central mutation database , 2007, Journal of Medical Genetics.

[13]  Steve D. M. Brown,et al.  The International Mouse Phenotyping Consortium: past and future perspectives on mouse phenotyping , 2012, Mammalian Genome.

[14]  Johnny S. H. Kwan,et al.  Predicting Mendelian Disease-Causing Non-Synonymous Single Nucleotide Variants in Exome Sequencing Studies , 2013, PLoS genetics.

[15]  Bart De Moor,et al.  eXtasy: variant prioritization by genomic data fusion , 2013, Nature Methods.

[16]  M. G. Reese,et al.  A probabilistic disease-gene finder for personal genomes. , 2011, Genome research.

[17]  Eric D. Green,et al.  VarSifter: Visualizing and analyzing exome-scale sequence variation data on a desktop computer , 2012, Bioinform..

[18]  S. Lewis,et al.  Uberon, an integrative multi-species anatomy ontology , 2012, Genome Biology.

[19]  Jana Marie Schwarz,et al.  MutationTaster evaluates disease-causing potential of sequence alterations , 2010, Nature Methods.

[20]  Ituro Inoue,et al.  Next-generation sequencing: impact of exome sequencing in characterizing Mendelian disorders , 2012, Journal of Human Genetics.

[21]  Melanie Bahlo,et al.  Reducing the exome search space for Mendelian diseases using genetic linkage analysis of exome genotypes , 2011, Genome Biology.

[22]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[23]  Elizabeth T. Cirulli,et al.  The Characterization of Twenty Sequenced Human Genomes , 2010, PLoS genetics.

[24]  Elizabeth T. Cirulli,et al.  SVA: software for annotating and visualizing sequenced human genomes , 2011, Bioinform..

[25]  Jing Zhang,et al.  PriVar: a toolkit for prioritizing SNVs and indels from next-generation sequencing data , 2013, Bioinform..

[26]  Damian Smedley,et al.  PhenoDigm: analyzing curated annotations to associate animal models with human diseases , 2013, Database J. Biol. Databases Curation.

[27]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[28]  P. Shannon,et al.  Exome sequencing identifies the cause of a Mendelian disorder , 2009, Nature Genetics.

[29]  S. Henikoff,et al.  Accounting for human polymorphisms predicted to affect protein function. , 2002, Genome research.

[30]  Damian Smedley,et al.  Phenotypic overlap in the contribution of individual genes to CNV pathogenicity revealed by cross-species computational analysis of single-gene mutations in humans, mice and zebrafish , 2012, Disease Models & Mechanisms.

[31]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[32]  Johnny S. H. Kwan,et al.  A comprehensive framework for prioritizing variants in exome sequencing studies of Mendelian diseases , 2012, Nucleic acids research.

[33]  Iuliana Ionita-Laza,et al.  Finding disease variants in Mendelian disorders by using sequence data: methods and applications. , 2011, American journal of human genetics.

[34]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[35]  Cynthia L. Smith,et al.  The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information , 2004, Genome Biology.

[36]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[37]  Gillian M Morriss-Kay,et al.  A gain-of-function mutation of Fgfr2c demonstrates the roles of this receptor variant in osteogenesis. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[38]  P. Robinson,et al.  Strategies for exome and genome sequence data analysis in disease‐gene discovery projects , 2011, Clinical genetics.

[39]  E. Boerwinkle,et al.  dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions , 2011, Human mutation.

[40]  Emily H Turner,et al.  Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome , 2010, Nature Genetics.

[41]  P. Robinson,et al.  Marfan syndrome with neonatal progeroid syndrome‐like lipodystrophy associated with a novel frameshift mutation at the 3′ terminus of the FBN1‐gene , 2010, American journal of medical genetics. Part A.

[42]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[43]  Heng Li,et al.  Tabix: fast retrieval of sequence features from generic TAB-delimited files , 2011, Bioinform..

[44]  Carol A. Bocchini,et al.  A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®) , 2011, Human mutation.

[45]  Monte Westerfield,et al.  Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation , 2009, PLoS biology.

[46]  Peter N. Robinson,et al.  Deep phenotyping for precision medicine , 2012, Human mutation.

[47]  Yves Moreau,et al.  Annotate-it: a Swiss-knife approach to annotation, analysis and interpretation of single nucleotide variation in human disease , 2012, Genome Medicine.

[48]  Murat Sincan,et al.  VAR‐MD: A tool to analyze whole exome–genome variants in small human pedigrees with mendelian inheritance , 2012, Human mutation.

[49]  J. Lupski,et al.  De novo truncating mutations in ASXL3 are associated with a novel clinical phenotype with similarities to Bohring-Opitz syndrome , 2013, Genome Medicine.

[50]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[51]  Steve D. M. Brown,et al.  Mouse large-scale phenotyping initiatives: overview of the European Mouse Disease Clinic (EUMODIC) and of the Wellcome Trust Sanger Institute Mouse Genetics Project , 2012, Mammalian Genome.