OVA: integrating molecular and physical phenotype data from multiple biomedical domain ontologies with variant filtering for enhanced variant prioritization

Motivation: Exome sequencing has become a de facto standard method for Mendelian disease gene discovery in recent years, yet identifying disease-causing mutations among thousands of candidate variants remains a non-trivial task. Results: Here we describe a new variant prioritization tool, OVA (ontology variant analysis), in which user-provided phenotypic information is exploited to infer deeper biological context. OVA combines a knowledge-based approach with a variant-filtering framework. It reduces the number of candidate variants by considering genotype and predicted effect on protein sequence, and scores the remainder on biological relevance to the query phenotype. We take advantage of several ontologies in order to bridge knowledge across multiple biomedical domains and facilitate computational analysis of annotations pertaining to genes, diseases, phenotypes, tissues and pathways. In this way, OVA combines information regarding molecular and physical phenotypes and integrates both human and model organism data to effectively prioritize variants. By assessing performance on both known and novel disease mutations, we show that OVA performs biologically meaningful candidate variant prioritization and can be more accurate than another recently published candidate variant prioritization tool. Availability and implementation: OVA is freely accessible at http://dna2.leeds.ac.uk:8080/OVA/index.jsp Supplementary information: Supplementary data are available at Bioinformatics online. Contact: umaan@leeds.ac.uk

[1]  Damian Smedley,et al.  PhenoDigm: analyzing curated annotations to associate animal models with human diseases , 2013, Database J. Biol. Databases Curation.

[2]  Yingyao Zhou,et al.  In Silico Gene Prioritization by Integrating Multiple Data Sources , 2011, PloS one.

[3]  Jesualdo Tomás Fernández-Breis,et al.  Linking Genome Annotation Projects with Genetic Disorders using Ontologies , 2012, Journal of Medical Systems.

[4]  Gang Fu,et al.  Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data , 2014, Nucleic Acids Res..

[5]  Bart De Moor,et al.  An unbiased evaluation of gene prioritization tools , 2012, Bioinform..

[6]  Nicola J. Mulder,et al.  A Topology-Based Metric for Measuring Term Similarity in the Gene Ontology , 2012, Adv. Bioinformatics.

[7]  Damian Smedley,et al.  Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research. , 2013, F1000Research.

[8]  L. Castagnoli,et al.  mentha: a resource for browsing integrated protein-interaction networks , 2013, Nature Methods.

[9]  Hannu Toivonen,et al.  Biomine: predicting links between biological entities using network models of heterogeneous databases , 2012, BMC Bioinformatics.

[10]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[11]  Sankar Subramanian,et al.  Using the plurality of codon positions to identify deleterious variants in human exomes , 2015, Bioinform..

[12]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[13]  Melinda R. Dwinell,et al.  The pathway ontology – updates and applications , 2014, Journal of Biomedical Semantics.

[14]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[15]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[16]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[17]  H WittenIan,et al.  The WEKA data mining software , 2009 .

[18]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[19]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[20]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[21]  Janan T Eppig,et al.  The mammalian phenotype ontology: enabling robust annotation and comparative analysis , 2009, Wiley interdisciplinary reviews. Systems biology and medicine.

[22]  Laura A. Crinnion,et al.  Robust Diagnostic Genetic Testing Using Solution Capture Enrichment and a Novel Variant-Filtering Interface , 2013, Human mutation.

[23]  Olivier Sallou,et al.  GPSy: a cross-species gene prioritization system for conserved biological processes—application in male gamete development , 2012, Nucleic Acids Res..

[24]  J. Harrow,et al.  A conditional knockout resource for the genome-wide study of mouse gene function , 2011, Nature.

[25]  I. Adzhubei,et al.  Predicting Functional Effect of Human Missense Mutations Using PolyPhen‐2 , 2013, Current protocols in human genetics.

[26]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[27]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[28]  S. Lewis,et al.  Uberon, an integrative multi-species anatomy ontology , 2012, Genome Biology.

[29]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[30]  Jing Zhang,et al.  PriVar: a toolkit for prioritizing SNVs and indels from next-generation sequencing data , 2013, Bioinform..

[31]  Jana Marie Schwarz,et al.  GeneDistiller—Distilling Candidate Genes from Linkage Intervals , 2008, PloS one.

[32]  Richard A Armstrong,et al.  When to use the Bonferroni correction , 2014, Ophthalmic & physiological optics : the journal of the British College of Ophthalmic Opticians.

[33]  Damian Smedley,et al.  Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases , 2014, Bioinform..

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[36]  Lokesh P. Tripathi,et al.  TargetMine, an Integrated Data Warehouse for Candidate Gene Prioritisation and Target Discovery , 2011, PloS one.

[37]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[38]  Delphine Pessoa,et al.  CESSM: collaborative evaluation of semantic similarity measures , 2009 .

[39]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[40]  Desmond J. Higham,et al.  GeneRank: Using search engine technology for the analysis of microarray experiments , 2005, BMC Bioinformatics.

[41]  David J. Porteous,et al.  SUSPECTS : enabling fast and effective prioritization of positional candidates , 2005 .

[42]  Damian Smedley,et al.  Improved exome prioritization of disease genes through cross-species phenotype comparison , 2014, Genome research.