PubCaseFinder: A Case-Report-Based, Phenotype-Driven Differential-Diagnosis System for Rare Diseases.

Recently, to speed up the differential-diagnosis process based on symptoms and signs observed from an affected individual in the diagnosis of rare diseases, researchers have developed and implemented phenotype-driven differential-diagnosis systems. The performance of those systems relies on the quantity and quality of underlying databases of disease-phenotype associations (DPAs). Although such databases are often developed by manual curation, they inherently suffer from limited coverage. To address this problem, we propose a text-mining approach to increase the coverage of DPA databases and consequently improve the performance of differential-diagnosis systems. Our analysis showed that a text-mining approach using one million case reports obtained from PubMed could increase the coverage of manually curated DPAs in Orphanet by 125.6%. We also present PubCaseFinder (see Web Resources), a new phenotype-driven differential-diagnosis system in a freely available web application. By utilizing automatically extracted DPAs from case reports in addition to manually curated DPAs, PubCaseFinder improves the performance of automated differential diagnosis. Moreover, PubCaseFinder helps clinicians search for relevant case reports by using phenotype-based comparisons and confirm the results with detailed contextual information.

[1]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[2]  David Moher,et al.  The CARE Guidelines: Consensus‐Based Clinical Case Reporting Guideline Development , 2013, Headache.

[3]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[4]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[5]  Maria Liakata,et al.  Don’t Let Notes Be Misunderstood: A Negation Detection Method for Assessing Risk of Suicide in Mental Health Records , 2016, CLPsych@HLT-NAACL.

[6]  Tudor Groza,et al.  Navigating the Phenotype Frontier: The Monarch Initiative , 2016, Genetics.

[7]  Ulf Leser,et al.  ALIBABA: PubMed as a graph , 2006, Bioinform..

[8]  Peter N. Robinson,et al.  The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease , 2015, American journal of human genetics.

[9]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[10]  Lin Gao,et al.  HPOSim: An R Package for Phenotypic Similarity Measure and Enrichment Analysis Based on the Human Phenotype Ontology , 2015, PloS one.

[11]  B. Fernandez,et al.  Utility of whole‐exome sequencing for those near the end of the diagnostic odyssey: time to address gaps in care , 2015, Clinical genetics.

[12]  Michael Brudno,et al.  PhenomeCentral: A Portal for Phenotypic and Genotypic Matchmaking of Patients with Rare Genetic Diseases , 2015, Human mutation.

[13]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[14]  Magalie S Leduc,et al.  Clinical whole-exome sequencing for the diagnosis of mendelian disorders. , 2013, The New England journal of medicine.

[15]  Nigel Collier,et al.  Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora , 2015, Database J. Biol. Databases Curation.

[16]  Russ B. Altman,et al.  Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text , 2009, BMC Bioinformatics.

[17]  Makoto Suematsu,et al.  Japan’s initiative on rare and undiagnosed diseases (IRUD): towards an end to the diagnostic odyssey , 2017, European Journal of Human Genetics.

[18]  Euan A Ashley,et al.  The Undiagnosed Diseases Network: Accelerating Discovery about Health and Disease. , 2017, American journal of human genetics.

[19]  Catia Pesquita,et al.  Evaluating GO-based Semantic Similarity Measures , 2007 .

[20]  Hui Yang,et al.  Phenolyzer: phenotype-based prioritization of candidate genes for human diseases , 2015, Nature Methods.

[21]  Yujun Han,et al.  Whole-exome sequencing in undiagnosed genetic diseases: interpreting 119 trios , 2015, Genetics in Medicine.

[22]  Michael Gruenberger,et al.  Similarity-based search of model organism, disease and drug effect phenotypes , 2015, Journal of Biomedical Semantics.

[23]  I. Krantz,et al.  Recognition of the Cornelia de Lange syndrome phenotype with facial dysmorphology novel analysis , 2016, Clinical genetics.

[24]  H. Stranneheim,et al.  Exome and genome sequencing: a revolution for the discovery and diagnosis of monogenic disorders , 2016, Journal of internal medicine.

[25]  Han Fang,et al.  Whole genome sequencing of one complex pedigree illustrates challenges with genomic medicine , 2017, BMC Medical Genomics.

[26]  Tudor Groza,et al.  The Human Phenotype Ontology in 2017 , 2016, Nucleic Acids Res..

[27]  Yasunori Yamamoto,et al.  Allie: a database and a search service of abbreviations and long forms , 2011, Database J. Biol. Databases Curation.

[28]  Alfonso Valencia,et al.  Implementing the iHOP concept for navigation of biomedical literature , 2005, ECCB/JBI.

[29]  Victor Wei Zhang,et al.  Precision Medicine for Continuing Phenotype Expansion of Human Genetic Diseases , 2015, BioMed research international.

[30]  Rolf Schröder,et al.  Clinical exome sequencing: results from 2819 samples reflecting 1000 families , 2016, European Journal of Human Genetics.

[31]  Diego Martínez Hernández,et al.  Automated semantic annotation of rare disease cases: a case study , 2014, Database J. Biol. Databases Curation.

[32]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[33]  Roser Torra,et al.  Rare diseases, rare presentations: recognizing atypical inherited kidney disease phenotypes in the age of genomics , 2017, Clinical kidney journal.

[34]  John C Carey,et al.  The importance of case reports in advancing scientific knowledge of rare diseases. , 2010, Advances in experimental medicine and biology.

[35]  Chris Mungall,et al.  The Matchmaker Exchange API: Automating Patient Matching Through the Exchange of Structured Phenotypic and Genotypic Profiles , 2015, Human mutation.

[36]  David J. Arenillas,et al.  GeneYenta: A Phenotype­Based Rare Disease Case Matching Tool Based on Online Dating Algorithms for the Acceleration of Exome Interpretation , 2015, Human mutation.

[37]  Dietrich Rebholz-Schuhmann,et al.  EBIMed - text crunching to gather facts for proteins from Medline , 2007, Bioinform..

[38]  Yael Garten,et al.  Recent progress in automatically extracting information from the pharmacogenomic literature. , 2010, Pharmacogenomics.

[39]  K. Bretonnel Cohen,et al.  Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters , 2014, BMC Bioinformatics.