A pipeline‐friendly software tool for genome diagnostics to prioritize genes by matching patient symptoms to literature

Despite an explosive growth of next‐generation sequencing data, genome diagnostics only provides a molecular diagnosis to a minority of patients. Software tools that prioritize genes based on patient symptoms using known gene‐disease associations may complement variant filtering and interpretation to increase chances of success. However, many of these tools cannot be used in practice because they are embedded within variant prioritization algorithms, or exist as remote services that cannot be relied upon or are unacceptable because of legal/ethical barriers. In addition, many tools are not designed for command‐line usage, closed‐source, abandoned, or unavailable. We present Variant Interpretation using Biomedical literature Evidence (VIBE), a tool to prioritize disease genes based on Human Phenotype Ontology codes. VIBE is a locally installed executable that ensures operational availability and is built upon DisGeNET‐RDF, a comprehensive knowledge platform containing gene‐disease associations mostly from literature and variant‐disease associations mostly from curated source databases. VIBE's command‐line interface and output are designed for easy incorporation into bioinformatic pipelines that annotate and prioritize variants for further clinical interpretation. We evaluate VIBE in a benchmark based on 305 patient cases alongside seven other tools. Our results demonstrate that VIBE offers consistent performance with few cases missed, but we also find high complementarity among all tested tools. VIBE is a powerful, free, open source and locally installable solution for prioritizing genes based on patient symptoms. Project source code, documentation, benchmark and executables are available at https://github.com/molgenis/vibe.

[1]  Núria Queralt-Rosinach,et al.  The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery , 2014, J. Biomed. Semant..

[2]  Jin-Dong Kim,et al.  PubCaseFinder: A Case-Report-Based, Phenotype-Driven Differential-Diagnosis System for Rare Diseases. , 2018, American journal of human genetics.

[3]  Bradley P. Coe,et al.  Copy number variation detection and genotyping from exome sequence data , 2012, Genome research.

[4]  Junaid Gamieldien,et al.  A practical guide to filtering and prioritizing genetic variants. , 2017, BioTechniques.

[5]  Magalie S Leduc,et al.  Clinical whole-exome sequencing for the diagnosis of mendelian disorders. , 2013, The New England journal of medicine.

[6]  Thomas Meitinger,et al.  Genetic diagnosis of Mendelian disorders via RNA sequencing , 2017, Nature Communications.

[7]  R. Pengelly,et al.  Evaluating phenotype-driven approaches for genetic diagnoses from exomes in a clinical setting , 2017, Scientific Reports.

[8]  Peter N. Robinson,et al.  Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology , 2014, BMC Bioinformatics.

[9]  Nick Dand,et al.  Text‐mined phenotype annotation and vector‐based similarity to improve identification of similar phenotypes and causative genes in monogenic disease patients , 2018, Human mutation.

[10]  Anh-Dao Nguyen,et al.  Clinical Genomic Database , 2013, Proceedings of the National Academy of Sciences.

[11]  Brett J. Kennedy,et al.  Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. , 2014, American journal of human genetics.

[12]  Núria Queralt-Rosinach,et al.  DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes , 2015, Database J. Biol. Databases Curation.

[13]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[14]  Tudor Groza,et al.  The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species , 2019, Nucleic Acids Res..

[15]  Cleo C. van Diemen,et al.  Author response for "A pipeline-friendly software tool for genome diagnostics to prioritize genes by matching patient symptoms to literature" , 2020 .

[16]  Hui Yang,et al.  Phenolyzer: phenotype-based prioritization of candidate genes for human diseases , 2015, Nature Methods.

[17]  Bart De Moor,et al.  eXtasy: variant prioritization by genomic data fusion , 2013, Nature Methods.

[18]  Birgit Sikkema-Raddatz,et al.  Improving the diagnostic yield of exome- sequencing by predicting gene–phenotype associations using large-scale gene expression analysis , 2018, Nature Communications.

[19]  Laurie D. Smith,et al.  A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases , 2015, Genome Medicine.

[20]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[21]  Núria Queralt-Rosinach,et al.  Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research , 2014, BMC Bioinformatics.

[22]  David R. FitzPatrick,et al.  Paediatric genomics: diagnosing rare disease in children , 2018, Nature Reviews Genetics.

[23]  Tudor Groza,et al.  Encoding Clinical Data with the Human Phenotype Ontology for Computational Differential Diagnostics , 2019, Current protocols in human genetics.

[24]  Patrice Godard,et al.  PCAN: phenotype consensus analysis to support disease-gene association , 2016, BMC Bioinformatics.

[25]  Núria Queralt-Rosinach,et al.  DisGeNET-RDF: harnessing the innovative power of the Semantic Web to explore the genetic basis of diseases , 2015, bioRxiv.

[26]  Gert Jan van der Wilt,et al.  A clinical utility study of exome sequencing versus conventional genetic testing in pediatric neurology , 2017, Genetics in Medicine.

[27]  Rolf Schröder,et al.  Clinical exome sequencing: results from 2819 samples reflecting 1000 families , 2016, European Journal of Human Genetics.

[28]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[29]  Ian M. Carr,et al.  OVA: integrating molecular and physical phenotype data from multiple biomedical domain ontologies with variant filtering for enhanced variant prioritization , 2015, Bioinform..

[30]  Damian Smedley,et al.  Next-generation diagnostics and disease-gene discovery with the Exomiser , 2015, Nature Protocols.

[31]  Birgit Sikkema-Raddatz,et al.  Rapid Targeted Genomics in Critically Ill Newborns , 2017, Pediatrics.

[32]  Eric Vilain,et al.  Clinical exome sequencing for genetic identification of rare Mendelian disorders. , 2014, JAMA.

[33]  D. Jordan,et al.  Large Numbers of Genetic Variants Considered to be Pathogenic are Common in Asymptomatic Individuals , 2013, Human mutation.

[34]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[35]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[36]  S. Mundlos,et al.  The Human Phenotype Ontology , 2010, Clinical genetics.

[37]  Xiaoyu Chen,et al.  Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications , 2016, Bioinform..

[38]  Gill Bejerano,et al.  AMELIE accelerates Mendelian patient diagnosis directly from the primary literature , 2017, bioRxiv.

[39]  David P. Nusinow,et al.  Estimating the Selective Effects of Heterozygous Protein Truncating Variants from Human Exome Data , 2017, Nature Genetics.

[40]  Matthew N. Bainbridge,et al.  A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics , 2016, Genome Medicine.

[41]  Satoshi Takahashi,et al.  PosMed: ranking genes and bioresources based on Semantic Web Association Study , 2013, Nucleic Acids Res..

[42]  Tudor Groza,et al.  Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources , 2018, Nucleic Acids Res..

[43]  Damian Smedley,et al.  Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome , 2014, Science Translational Medicine.

[44]  Shesh N Rai,et al.  Metabolomics data analysis and missing value issues with application to infarcted mouse hearts , 2015, BMC Bioinformatics.

[45]  Wyeth W. Wasserman,et al.  Integration of genomics and metabolomics for prioritization of rare disease variants: a 2018 literature review , 2018, Journal of Inherited Metabolic Disease.

[46]  S. Kingsmore,et al.  Rapid whole-genome sequencing decreases infant morbidity and cost of hospitalization , 2018, npj Genomic Medicine.

[47]  Michael Brudno,et al.  PhenoTips: Patient Phenotyping Software for Clinical and Research Use , 2013, Human mutation.

[48]  P. Ng,et al.  Phen-Gen: combining phenotype and genotype to analyze rare disorders , 2014, Nature Methods.

[49]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[50]  D. MacArthur,et al.  Using high-resolution variant frequencies to empower clinical genome interpretation , 2016, Genetics in Medicine.

[51]  Sharon E. Plon,et al.  Evaluation of in silico algorithms for use with ACMG/AMP clinical variant interpretation guidelines , 2017, Genome Biology.

[52]  Anna Lehman,et al.  The cost and diagnostic yield of exome sequencing for children with suspected genetic disorders: a benchmarking study , 2018, Genetics in Medicine.