NoteOntoFox : web-based support for ontology reuse

Background: Ontology development is a rapidly growing area of research, especially in the life sciences domain. To promote collaboration and interoperability between different projects, the OBO Foundry principles require that these ontologies be open and non-redundant, avoiding duplication of terms through the re-use of existing resources. As current options to do so present various difficulties, a new approach, MIREOT, allows specifying import of single terms. Initial implementations allow for controlled import of selected annotations and certain classes of related terms. Findings: OntoFox http://ontofox.hegroup.org/ is a web-based system that allows users to input terms, fetch selected properties, annotations, and certain classes of related terms from the source ontologies and save the results using the RDF/XML serialization of the Web Ontology Language (OWL). Compared to an initial implementation of MIREOT, OntoFox allows additional and more easily configurable options for selecting and rewriting annotation properties, and for inclusion of all or a computed subset of terms between low and top level terms. Additional methods for including related classes include a SPARQL-based ontology term retrieval algorithm that extracts terms related to a given set of signature terms and an option to extract the hierarchy rooted at a specified ontology term. OntoFox's output can be directly imported into a developer's ontology. OntoFox currently supports term retrieval from a selection of 15 ontologies accessible via SPARQL endpoints and allows users to extend this by specifying additional endpoints. An OntoFox application in the development of the Vaccine Ontology (VO) is demonstrated. Conclusions: OntoFox provides a timely publicly available service, providing different options for users to collect terms from external ontologies, making them available for reuse by import into client OWL ontologies. Background Biomedical ontologies are sets of terms and relations that represent entities in the scientific world and how they relate to each other. Terms are associated with documentation and definitions, which are, ideally, expressed in formal logic in order to support automated reasoning [1-3]. Ontologies have dramatically changed how biomedical research is conducted. For example, since the Gene Ontology (GO) was first published in 2000 [1], it has been used and cited in more than 2000 peer-reviewed journal articles [4]. Ontologies have been used in various applications, such as gene expression data analysis [1], literature mining [5], and as the underpinning of a semantic web [6]. There are currently more than 150 biomedical ontologies and 700,000 entities in the NCBO BioPortal http:// bioportal.bioontology.org/. With new resources continuously being developed, maximizing ontology sharing and interoperability has become a growing concern [7,8]. The development of a new biomedical ontology covering a specific domain is often an ambitious, time-consuming project, usually requiring extensive crosscommunity collaboration. The OBO Foundry is an open community that has established a set of principles for ontology development with the goal of creating a suite of interoperable reference ontologies in the biomedical domain [3]. These principles require that member ontologies be open, orthogonal, expressed in a common shared syntax, and designed to possess a common space of identifiers. One way of meeting the goal of interoperability is to reuse existing resources by importing them into the tobe-created ontology. For example, the Vaccine Ontology (VO, http://www.violinet.org/vaccineontology) [9] relies on many terms (e.g., administering substance in vivo) already described by other biomedical ontologies, such as the Ontology for Biomedical Investigations (OBI; http:// purl.obolibrary.org/obo/obi). * Correspondence: yongqunh@umich.edu 1 Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109, USA Full list of author information is available at the end of the article © 2010 He et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Xiang et al. BMC Research Notes 2010, 3:175 http://www.biomedcentral.com/1756-0500/3/175 Page 2 of 12 OWL currently only provides a mechanism to import ontologies as a whole [10]. This approach is reasonable and recommended for small ontologies that are designed in ways consistent with the importing ontology. However, in many cases this is neither practical nor needed. For example, the source ontology may be too large for editing tools, use different design patterns, or be at an early stage of development. Nevertheless, individual terms in such ontologies may be well-defined and therefore desirable to reuse. As an example, the Chemical Entities of Biological Interest ontology (ChEBI; http://www.ebi.ac.uk/chebi/) currently includes over 455,000 terms. Importing CHEBI as a whole into a target ontology is impractical given current editing (e.g., Protégé ontology editor [11]) and reasoning tools (e.g., Pellet [12] and Fact ++ [13]). Protégé can perhaps handle in the low 10,000s of terms before becoming too slow to use, and with the addition of complex logical restrictions, reasoning performance is noninteractive with the resources used. As a practical alternative to importing whole ontologies, MIREOT (Minimum Information to Reference an External Ontology Term) was developed in the context of the OBI project [14]. MIREOT proposes selective use of classes from external ontologies that are of direct interest to a target ontology, instead of importing external ontologies as a whole. For example, both the OBI and the VO require the ontology term 'homo sapiens', and have decided to use the NCBI Taxonomy Ontology (NCBITaxon) as a common resource for naming taxonomic groups. The corresponding URI for 'homo sapiens' is http://purl.org/obo/owl/NCBITaxon#NCBITaxon_9606. MIREOT specifies that the minimal information needed to specify reuse of this term is (i) this URI, (ii) the URI of the parent term in the importing ontology (http://purl. obolibrary.org/obo/OBI_0100026, organism), and (iii) the ontology IRI of the source ontology. Based on this minimal information, an automated process can be used to retrieve (and periodically refresh) chosen additional information such as the preferred label for the term and elements of the taxonomic hierarchy. MIREOT is being used in a number of ontology projects, for example, OBI, VO, the Influenza Ontology (InfluenzO; http://sourceforge.net/projects/influenzo/), Neural ElectroMagnetic Ontologies (NEMO; http://nemo.nic.uoregon.edu/wiki/ NEMO), ontologies developed in the Neuroscience Information Framework (NIF; https://confluence.crbs.ucsd. edu/display/NIF/), and as part of the eagle-i project https: //www.eagle-i.org/home/. While editing tools commonly provide means to reference an external term by directly setting its URI, one must also manually enter auxiliary information necessary for practical editing, such as the label and definition, and update such information if the source ontology changes. In addition, it is often desirable to import additional related terms. For example, when the Vaccine Ontology imports a species term, the inclusion of some of its superclasses allows for queries at different taxonomic ranks (e.g., kingdom, phylum, and species). To address these issues, an initial implementation based on MIREOT was created to facilitate managing the tedious aspects of this process automatically http://obi-ontology.org/page/MIREOT. The developers of the MIREOT guideline recognize that such an approach is a balanced compromise. Importing only selected information means that incomplete or incorrect inferences could conceivably be made. Technical approaches such as module extraction [12,15-17] promise to preserve correct inference, under a variety of assumptions, by computationally selecting portions of an ontology. Recent work on modularization casts it as a process that fragments existing ontologies into a set of smaller and possibly interconnected parts or modules [12,15-18] that can then be reused as units of ontology [19]. There have been several approaches to computing modules [20]. Structural approaches use the syntax of the axioms of ontologies and mostly only consider the induced is-a hierarchy [17,21]. Logic-based approaches take into account the consequences of ontologies and require that this extracted module captures the meaning of the imported terms used, i.e., includes all axioms relevant to the meaning of these terms. However, Grau et al. [22] proved that it is undecidable, even for description logics simpler than OWL-DL, to determine whether a subset of an ontology is a minimal logic-based module. These approaches are relatively new, experience using them is limited, and our experience with current Webbased implementations has found them to be unreliable. Moreover the methods do not provide ways to avoid import of certain terms or axioms that might not be considered desirable, or have other issues that prevent their easy use [23]. Nonetheless the syntactic locality approach these methods use is applicable to single-term import and so is compatible with the MIREOT approach. The OBI project has an implementation of the MIREOT mechanism that demonstrates the feasibility of the approach. It is, however, command line-based and requires the specification of terms either by commandline scripts or construction of an ontology document. Specification of which ancillary information should be incorporated is by writing SPARQL queries [24], restricting its adoption by less technically able users. To facilitate application of the MIREOT guideline by the wider ontology community a more u

[1]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[2]  Luigi Iannone,et al.  Ontology module extraction for ontology reuse: an ontology engineering perspective , 2007, CIKM '07.

[3]  A. Rector,et al.  Relations in biomedical ontologies , 2005, Genome Biology.

[4]  Bjoern Peters,et al.  VO: Vaccine Ontology , 2009 .

[5]  Hongfang Liu,et al.  Framework for a Protein Ontology , 2007, BMC Bioinformatics.

[6]  Anand Kumar,et al.  Text mining and ontologies in biomedicine: Making sense of raw text , 2005, Briefings Bioinform..

[7]  Ian Horrocks,et al.  Modular Reuse of Ontologies: Theory and Practice , 2008, J. Artif. Intell. Res..

[8]  Cynthia L. Smith,et al.  The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information , 2004, Genome Biology.

[9]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[10]  Roger S. Pressman,et al.  Software Engineering: A Practitioner's Approach , 1982 .

[11]  Alan Ruttenberg,et al.  Life sciences on the Semantic Web: the Neurocommons and beyond , 2009, Briefings Bioinform..

[12]  Luca Pulina,et al.  Minimal Module Extraction from DL-Lite Ontologies Using QBF Solvers , 2009, IJCAI.

[13]  Mark A. Musen,et al.  Specifying Ontology Views by Traversal , 2004, International Semantic Web Conference.

[14]  Alan Ruttenberg,et al.  MIREOT: The minimum information to reference an external ontology term , 2009, Appl. Ontology.

[15]  Ian Horrocks,et al.  Extracting Modules from Ontologies: A Logic-based Approach , 2009, OWLED.

[16]  Ian Horrocks,et al.  Just the right amount: extracting modules from ontologies , 2007, WWW '07.

[17]  Ian Horrocks,et al.  FaCT++ Description Logic Reasoner: System Description , 2006, IJCAR.

[18]  Yarden Katz,et al.  Pellet: A practical OWL-DL reasoner , 2007, J. Web Semant..

[19]  Rafael Berlanga Llavori,et al.  Safe and Economic Re-Use of Ontologies: A Logic-Based Methodology and Tool Support , 2008, OWLED.

[20]  Alan L. Rector,et al.  Web ontology segmentation: analysis, classification and use , 2006, WWW '06.

[21]  Christopher G. Chute,et al.  Survey of modular ontology techniques and their applications in the biomedical domain , 2009, Integr. Comput. Aided Eng..