An Evaluation of Annotation Tools for Biomedical Texts

Biomedical texts are a rich information source that cannot be ignored. There are several text annotation tools that may be used to extract useful information from these texts. However, the multi-domain characteristic of these texts, and the diversity of ontologies available in this area, demands a careful analysis before choosing an annotation tool. This work presents an evaluation of the existing annotation tools, with focus on biomedical texts. Initially, based on a set of required characteristics, a tool selection was conducted. AutoMeta and Gate tools were selected for a more detailed evaluation. They were quantitatively and qualitatively evaluated. Results of such evaluation are discussed and bring to light the best/worst of each tool. 1. Introducao The constant growth of data and publications in the Biomedical area has been pushing the creation and reuse of domain ontologies in that area, not only for structured data annotation, but also for text indexation and annotation. Particularly, text bases are a rich information extraction source, since many biomedical findings are available only in textual format. PubMed 1 is one of the most popular digital biomedical citation reference (more than 21 million texts). Each text citation is associated (indexed) using MeSH 2 thesaurus. However, in order to facilitate the extraction of information from texts, a more automated and detailed indexation is required. Biomedical area texts are typically multi-domain, and require different ontologies for their annotation. The Open Biological and Biomedical Ontologies (OBO) Foundry [Smith et al. 2007] and the NCBO BioPortal [Noy et al. 2009] provide together more than 300 ontologies. The motivation of this work is to provide support for annotation with multiple ontologies. For instance, a paper about drug targets usually refers to proteins, diseases, organisms, pharmacogenomics, etc. Each of these terms can be annotated by different domain ontologies such as: GO (Gene Ontology) [The Gene Ontology Consortium 2000], for gene and protein annotations, NCBITaxon 3 (NCBI organismal classification), for organisms, and PHARE (The PHArmacogenomic 1 http://www.ncbi.nlm.nih.gov/pubmed/ 2 http://www.nlm.nih.gov/mesh/ 3 http://bioportal.bioontology.org/ontologies/1132/

[1]  Arthur Stutt,et al.  MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup , 2002, EKAW.

[2]  Alexiei Dingli,et al.  Timely and Non-Intrusive Active Document Annotation via Adaptive Information Extraction , 2002, SAAKM@ECAI.

[3]  Maria Cláudia Reis Cavalcanti,et al.  Applying Graph Partitioning Techniques to Modularize Large Ontologies , 2012, ONTOBRAS-MOST.

[4]  Christopher G. Chute,et al.  BioPortal: ontologies and integrated data resources at the click of a mouse , 2009, Nucleic Acids Res..

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  Atanas Kiryakov,et al.  Towards Semantic Web Information Extraction , 2003 .

[7]  Atanas Kiryakov,et al.  KIM - Semantic Annotation Platform , 2003, SEMWEB.

[8]  Melania Duma RDFa Editor for Ontological Annotation , 2011, RANLP Student Research Workshop.

[9]  Ali Khalili,et al.  The RDFa Content Editor - From WYSIWYG to WYSIWYM , 2012, 2012 IEEE 36th Annual Computer Software and Applications Conference.

[10]  Ladislav Hluchý,et al.  Ontology based Text Annotation - OnTeA , 2006, EJC.

[11]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[12]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[13]  David W. Embley,et al.  Automatic Creation and Simplified Querying of Semantic Web Content: An Approach Based on Information-Extraction Ontologies , 2006, ASWC.

[14]  Timos K. Sellis,et al.  Integrating Keywords and Semantics on Document Annotation and Search , 2010, OTM Conferences.

[15]  Marja-Riitta Koivunen,et al.  Annotea: an open RDF infrastructure for shared Web annotations , 2001, WWW '01.