Comparing Concept Recognizers for Ontology-Based Indexing : MGREP vs . MetaMap

The National Center for Biomedical Ontology is developing a system for automated, ontology-based access to online biomedical resources. The system’s indexing workflow processes the text metadata of diverse resources such as datasets from GEO and ArrayExpress to annotate and index them with concepts from appropriate ontologies. This indexing requires the use of a concept-recognition system to identify the presence of ontology concepts in the resource metadata. In this paper, we present a comprehensive comparison of two concept recognizers – NIH’s MetaMap and the University of Michigan’s MGREP. We utilize a number of data sources and dictionaries to evaluate the concept recognizers in terms of precision, recall, speed of execution, scalability and customizability. Our evaluations demonstrate that MGREP has a clear edge over MetaMap for large-scale applications. Based on our analysis we also suggest areas of potential improvements for MGREP.

[1]  A. Butte,et al.  Creation and implications of a phenome-genome network , 2006, Nature Biotechnology.

[2]  Dietrich Rebholz-Schuhmann,et al.  Protein annotation by EBIMed , 2006, Nature Biotechnology.

[3]  W Hersh,et al.  The SAPHIRE server: a new algorithm and implementation. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[4]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[5]  K. Bretonnel Cohen,et al.  Manual curation is not sufficient for annotation of genomic databases , 2007, ISMB/ECCB.

[6]  Daniel L. Rubin,et al.  Annotation and query of tissue microarray data using the NCI Thesaurus , 2007, BMC Bioinformatics.

[7]  Jian Su,et al.  Recognition of protein/gene names from text using an ensemble of classifiers , 2005, BMC Bioinformatics.

[8]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[9]  Burr Settles ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text , 2005 .

[10]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[11]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[12]  Yuval Shahar,et al.  Application of Information Technology: A Comparative Evaluation of Full-text, Concept-based, and Context-sensitive Search , 2007, J. Am. Medical Informatics Assoc..

[13]  Hyoil Han,et al.  CONANN: An Online Biomedical Concept Annotator , 2007, DILS.

[14]  Rong Chen,et al.  Ontology-driven indexing of public datasets for translational bioinformatics , 2009, BMC Bioinformatics.