Ontology quality assurance through analysis of term transformations

Motivation: It is important for the quality of biological ontologies that similar concepts be expressed consistently, or univocally. Univocality is relevant for the usability of the ontology for humans, as well as for computational tools that rely on regularity in the structure of terms. However, in practice terms are not always expressed consistently, and we must develop methods for identifying terms that are not univocal so that they can be corrected. Results: We developed an automated transformation-based clustering methodology for detecting terms that use different linguistic conventions for expressing similar semantics. These term sets represent occurrences of univocality violations. Our method was able to identify 67 examples of univocality violations in the Gene Ontology. Availability: The identified univocality violations are available upon request. We are preparing a release of an open source version of the software to be available at http://bionlp.sourceforge.net. Contact: karin.verspoor@ucdenver.edu

[1]  Philip D. Butcher,et al.  Comparative and Functional Genomics , 2002, Comparative and Functional Genomics.

[2]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[3]  Kirill Degtyarenko,et al.  Chemical Vocabularies and Ontologies for Bioinformatics , 2003 .

[4]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[5]  Hua Min,et al.  Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus , 2003, J. Biomed. Informatics.

[6]  Barry Smith,et al.  BMC Bioinformatics Methodology article , 2005 .

[7]  James J. Cimino,et al.  Research Paper: Auditing the Unified Medical Language System with Semantic Methods , 1998, J. Am. Medical Informatics Assoc..

[8]  Benedictus de Spinoza,et al.  The collected works of Spinoza , 1985 .

[9]  Martha Palmer,et al.  Nominalization and Alternations in Biomedical Language , 2008, PloS one.

[10]  James J. Cimino Battling Scylla and Charybdis: the search for redundancy and ambiguity in the 2001 UMLS metathesaurus , 2001, AMIA.

[11]  Karin M. Verspoor,et al.  Towards a Semantic Lexicon for Biological Language Processing , 2005, Comparative and functional genomics.

[12]  K. Bretonnel Cohen,et al.  The Compositional Structure of Gene Ontology Terms , 2003, Pacific Symposium on Biocomputing.

[13]  Christopher J. Mungall,et al.  Obol: Integrating Language and Meaning in Bio-Ontologies , 2004, Comparative and functional genomics.

[14]  Werner Ceusters,et al.  Mistakes in medical ontologies: where do they come from and how can they be detected? , 2004, Studies in health technology and informatics.

[15]  Zhiyong Lu,et al.  Evaluation of Lexical Methods for Detecting Relationships Between Concepts from Multiple Ontologies , 2006, Pacific Symposium on Biocomputing.

[16]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[17]  Lawrence Hunter,et al.  Identification of OBO nonalignments and its implications for OBO enrichment , 2008, Bioinform..

[18]  K. Bretonnel Cohen,et al.  Contrast and variability in gene names , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.