Linking rare and common disease: mapping clinical disease-phenotypes to ontologies in therapeutic target validation

BackgroundThe Centre for Therapeutic Target Validation (CTTV - https://www.targetvalidation.org/) was established to generate therapeutic target evidence from genome-scale experiments and analyses. CTTV aims to support the validity of therapeutic targets by integrating existing and newly-generated data. Data integration has been achieved in some resources by mapping metadata such as disease and phenotypes to the Experimental Factor Ontology (EFO). Additionally, the relationship between ontology descriptions of rare and common diseases and their phenotypes can offer insights into shared biological mechanisms and potential drug targets. Ontologies are not ideal for representing the sometimes associated type relationship required. This work addresses two challenges; annotation of diverse big data, and representation of complex, sometimes associated relationships between concepts.MethodsSemantic mapping uses a combination of custom scripting, our annotation tool ‘Zooma’, and expert curation. Disease-phenotype associations were generated using literature mining on Europe PubMed Central abstracts, which were manually verified by experts for validity. Representation of the disease-phenotype association was achieved by the Ontology of Biomedical AssociatioN (OBAN), a generic association representation model. OBAN represents associations between a subject and object i.e., disease and its associated phenotypes and the source of evidence for that association. The indirect disease-to-disease associations are exposed through shared phenotypes. This was applied to the use case of linking rare to common diseases at the CTTV.ResultsEFO yields an average of over 80 % of mapping coverage in all data sources. A 42 % precision is obtained from the manual verification of the text-mined disease-phenotype associations. This results in 1452 and 2810 disease-phenotype pairs for IBD and autoimmune disease and contributes towards 11,338 rare diseases associations (merged with existing published work [Am J Hum Genet 97:111-24, 2015]). An OBAN result file is downloadable at http://sourceforge.net/p/efo/code/HEAD/tree/trunk/src/efoassociations/. Twenty common diseases are linked to 85 rare diseases by shared phenotypes. A generalizable OBAN model for association representation is presented in this study.ConclusionsHere we present solutions to large-scale annotation-ontology mapping in the CTTV knowledge base, a process for disease-phenotype mining, and propose a generic association model, ‘OBAN’, as a means to integrate disease using shared phenotypes.AvailabilityEFO is released monthly and available for download at http://www.ebi.ac.uk/efo/.

[1]  Benjamin M. Good,et al.  Microtask Crowdsourcing for Disease Mention Annotation in PubMed Abstracts , 2014, Pacific Symposium on Biocomputing.

[2]  Alan Ruttenberg,et al.  MIREOT: The minimum information to reference an external ontology term , 2009, Appl. Ontology.

[3]  Robert Arp,et al.  Function, Role and Disposition in Basic Formal Ontology , 2008 .

[4]  Avi Ma'ayan,et al.  Lean Big Data integration in systems biology and systems pharmacology. , 2014, Trends in pharmacological sciences.

[5]  Carsten Sinz,et al.  Reducing False Positives by Combining Abstract Interpretation and Bounded Model Checking , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[6]  Simon Jupp,et al.  Collaborative Ontology Development Using the Webulous Architecture and Google App , 2015, SWAT4LS.

[7]  Andrew M. Jenkinson,et al.  The EBI RDF platform: linked open data for the life sciences , 2014, Bioinform..

[8]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[9]  E. Brown,et al.  The Medical Dictionary for Regulatory Activities (MedDRA) , 1999, Drug safety.

[10]  Judith A. Blake,et al.  Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon , 2014, Journal of Biomedical Semantics.

[11]  Chris Mungall,et al.  What's in a Genotype?: An Ontological Characterization for Integration of Genetic Variation Data , 2013, ICBO.

[12]  D. Cooper,et al.  Microattribution and nanopublication as means to incentivize the placement of human genome variation data into the public domain , 2012, Human mutation.

[13]  C E Lipscomb,et al.  Medical Subject Headings (MeSH). , 2000, Bulletin of the Medical Library Association.

[14]  Peter N. Robinson,et al.  The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease , 2015, American journal of human genetics.

[15]  Prakash M. Nadkarni,et al.  Research Paper: Use of General-purpose Negation Detection to Augment Concept Indexing of Medical Documents: A Quantitative Study Using the UMLS , 2001, J. Am. Medical Informatics Assoc..

[16]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[17]  Rachael P. Huntley,et al.  Standardized description of scientific evidence using the Evidence Ontology (ECO) , 2014, Database J. Biol. Databases Curation.

[18]  Chao Chen,et al.  dbVar and DGVa: public archives for genomic structural variation , 2012, Nucleic Acids Res..

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  Organización Mundial de la Salud Guidelines for ATC classification and DDD assignment , 1996 .

[21]  R. Rapini,et al.  Dermatologic manifestations of colonic disorders , 2009, Current opinion in gastroenterology.

[22]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[23]  Larry Wright,et al.  Overview and Utilization of the NCI Thesaurus , 2004, Comparative and functional genomics.

[24]  A Stepanova,et al.  [Association of psoriasis and congenital lamellar ichthyosis]. , 2001, Der Hautarzt; Zeitschrift fur Dermatologie, Venerologie, und verwandte Gebiete.

[25]  J. Rashbass Online Mendelian Inheritance in Man. , 1995, Trends in genetics : TIG.

[26]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[27]  W. Küster,et al.  Zusammentreffen einer Psoriasis und einer kongenitalen lamellären Ichthyose , 2001, Der Hautarzt.

[28]  Gang Fu,et al.  Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data , 2014, Nucleic Acids Res..

[29]  Sophia Ananiadou,et al.  Europe PMC: a full-text literature database for the life sciences and platform for innovation , 2014, Nucleic Acids Res..

[30]  Janan T Eppig,et al.  The mammalian phenotype ontology: enabling robust annotation and comparative analysis , 2009, Wiley interdisciplinary reviews. Systems biology and medicine.

[31]  Yue Liu,et al.  CLO: The cell line ontology , 2014, J. Biomed. Semant..

[32]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[33]  Nicolette de Keizer,et al.  Forty years of SNOMED: a literature review , 2008, BMC Medical Informatics Decis. Mak..

[34]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..