A method exploiting syntactic patterns and the UMLS semantics for aligning biomedical ontologies: The case of OBO disease ontologies

The OBO ontologies include more than 50 standard vocabularies that cover different domains, including genomics, chemistry, anatomy and phenotype. Ontology alignment is a means to build consistent biomedical ontologies compatible with standard vocabularies and dedicated to specific domains, such as cancer. An alignment is defined as a set of pairs of concepts, coming from two ontologies, related by a relation R, R not being restricted to the equivalence or subsumption relations. Alignment is performed in three major steps: first, the concepts that are equivalent in the ontologies are identified; second the pairs of concepts that are related although not equivalent are searched for; third the relations between the concepts are characterized. We have developed a method to align ontologies that exploits the compositionality of the terms in OBO ontologies, uses the UMLS to provide synonyms and relations, and defines syntactico-semantic patterns that characterize semantically the relations between concepts. We have applied it to four OBO phenotype ontologies: mouse pathology, human disease, mammalian phenotype, and PATO. We found 386 pairs of equivalent concepts and 20,461 pairs of concepts where one concept name is included in the other term. Among the 20,460 inclusions, we were able to provide a semantic categorization for 2682 relations. In 2552 cases, the relation was present and semantically defined in the UMLS Metathesaurus, in 131 cases the relation was characterized through semantic patterns. Our approach may help to find the semantic relations between concepts in ontologies.

[1]  Jérôme Euzenat,et al.  Specification of a Common Framework for Characterizing Alignment , 2004 .

[2]  Michel Klein,et al.  Combining and relating ontologies: an analysis of problems and solutions , 2001, OIS@IJCAI.

[3]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[4]  Michael Gruenberger,et al.  Pathbase: a database of mutant mouse pathology , 2004, Nucleic Acids Res..

[5]  William W. Cohen,et al.  A Comparison of String Metrics for Matching Names and Records , 2003 .

[6]  Olivier Bodenreider,et al.  Using WordNet to Improve the Mapping of Data Elements to UMLS for Data Sources Integration , 2006, AMIA.

[7]  Fausto Giunchiglia,et al.  Web Explanations for Semantic Heterogeneity Discovery , 2005, ESWC.

[8]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[9]  Gerda Ruge,et al.  Automatic Detection of Thesaurus relations for Information Retrieval Applications , 1997, Foundations of Computer Science: Potential - Theory - Cognition.

[10]  Luciano Serafini,et al.  Semantic Coordination: A New Approach and an Application , 2003, SEMWEB.

[11]  Natalia Grabar,et al.  Lexically-Based Terminology Structuring: Some Inherent Limits , 2002, COLING 2002.

[12]  K. Bretonnel Cohen,et al.  The Compositional Structure of Gene Ontology Terms , 2003, Pacific Symposium on Biocomputing.

[13]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[14]  Frank van Harmelen,et al.  Exploiting the Structure of Background Knowledge Used in Ontology Matching , 2006, Ontology Matching.

[15]  Marc Ehrig,et al.  State of the art on ontology alignment , 2013 .

[16]  Olivier Bodenreider,et al.  Assessing the consistency of a biomedical terminology through lexical knowledge , 2002, Int. J. Medical Informatics.

[17]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[18]  Mark A. Musen,et al.  The PROMPT suite: interactive tools for ontology merging and mapping , 2003, Int. J. Hum. Comput. Stud..

[19]  Patrick Lambrix,et al.  SAMBO - A system for aligning and merging biomedical ontologies , 2006, J. Web Semant..

[20]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[21]  Mark A. Musen,et al.  PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment , 2000, AAAI/IAAI.

[23]  A. Burgun,et al.  An ontology of chemical entities helps identify dep endence relations among Gene Ontology terms , 2005 .

[24]  Cynthia L. Smith,et al.  The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information , 2004, Genome Biology.

[25]  Jérôme Euzenat,et al.  An API for Ontology Alignment , 2004, SEMWEB.

[26]  Olivier Bodenreider,et al.  Aggregating UMLS Semantic Types for Reducing Conceptual Complexity , 2001, MedInfo.