As knowledge engineering moves to the (Semantic) Web, ontologies become dynamic products of collaborative development rather than artifacts produced in a closed environment of a single research group. We examine today’s large collaborative ontology-development projects–in particular in the domain of biomedicine–and outline some requirements for the tools to support this enterprise. We then present our initial prototype of Collaborative Protege–an extension to the Protege ontology-editing environment that enables distributed users to develop ontologies collaboratively and that provides an integrated platform for discussions. Collaborative Ontology Development in Biomedicine The biomedical community has embraced ontologies probably more than any other discipline. From the implementation of hospital information systems to the organization of experimental data for bioinformatics research, developers now identify the key issue to be the manner in which salient concepts are labeled and defined, and ultimately used computationally. With this embracement, there comes the next challenge, however: Ontologies and terminologies become so large, diverse, and specialized that it is often impossible for any single centralized group to develop them effectively. Indeed, the following projects represent just some of the most visible ontology-engineering initiatives that are incorporating community participation as a key element in their development work. The Gene Ontology (GO) is probably one of the more prominent examples of an ontology that is a product of a collaborative process (Hartel et al. 2005). GO provides terminology for consistent description of gene products in different model-organism databases in terms of their associated biological processes, cellular components, and molecular functions in a species-independent manner. Members of the GO community constantly suggest new terms for this ontology. Three full-time curators examine the suggestions and incorporate them into GO on a continual basis. The International Classification of Diseases (ICD)1 is a public global standard to organize and classify informahttp://www.who.int/classifications/icd/ en/ tion about diseases and related health problems. The World Health Organization (WHO) plans three major shifts for the upcoming 11 revision of ICD (ICD-11): First, the ICD will represent clinical knowledge explicitly in machineprocessable form. WHO plans to use an ontological approach, formalizing the definitions of each clinical entity and organizing the terms in a semantically meaningful way. Second, WHO will open the process of ICD revision to a wide community of experts. Topic Advisory Groups (TAGs) will serve as planning and coordinating bodies for specific areas of medicine, such as Oncology, Mental Health, and Communicable Diseases. Each TAG will support several international working groups and an additional corps of field testers who will use on-line tools to evaluate the evolving ontology and to generate proposals for revisions and enhancements. Third, ICD-11 will include direct linkages to terms in other standardized terminologies, such as SNOMED-CT. The National Cancer Institutes Thesaurus (NCI Thesaurus) is a biomedical reference ontology that covers areas of basic cancer biology, translational science, and clinical oncology developed at the NCI Center for Bioinformatics (NCICB) (Fragoso et al. 2004; Sioutos et al. 2007). Currently, the NCI Thesaurus is used to index documents, to support the NCI Cancer portal,2 as the terminology source for a number of applications such as a NCI Drug Dictionary,3 and for annotation of metadata in the Cancer Bioinformatics Grid (caBIG).4 Recently, the NCICB has launched a new terminology product, Biomedical Grid Terminology (BiomedGT), to support the needs of its NCICB partners. This new terminology restructures the NCI Thesaurus to facilitate terminology federation and open content development. The goal of BiomedGT is to empower the wider biomedical research community to participate directly and collaboratively in extending and refining the terminology on which they depend. The Ontology for Biomedical Investigations (OBI),5 a product of the OBI Consortium, is a federated ontology being developed collaboratively. OBI describes biological and http://cancer.gov http://www.cancer.gov/drugdictionary/ http://cabig.nci.nih.gov http://obi.sourceforge.net/consortium/
[1]
Cornelius Rosse,et al.
A Reference Ontology for Bioinformatics: The Foundational Model of Anatomy
,
2003
.
[2]
Sherri de Coronado,et al.
NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information
,
2007,
J. Biomed. Informatics.
[3]
Mark A. Musen,et al.
Ontology versioning in an ontology management framework
,
2004,
IEEE Intelligent Systems.
[4]
Eric Prud'hommeaux,et al.
Annotea: an open RDF infrastructure for shared Web annotations
,
2002,
Comput. Networks.
[5]
Alan L. Rector,et al.
Web ontology segmentation: analysis, classification and use
,
2006,
WWW '06.
[6]
M. Ashburner,et al.
The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration
,
2007,
Nature Biotechnology.
[7]
Jennifer Golbeck,et al.
Modeling a description logic vocabulary for cancer research
,
2005,
J. Biomed. Informatics.
[8]
Harith Alani,et al.
The CKC Challenge: Exploring Tools for Collaborative Knowledge Construction
,
2008,
IEEE Intelligent Systems.
[9]
Mark A. Musen,et al.
A Framework for Ontology Evolution in Collaborative Environments
,
2006,
SEMWEB.
[10]
Larry Wright,et al.
Overview and Utilization of the NCI Thesaurus
,
2004,
Comparative and functional genomics.