The ConceptMapper Approach to Named Entity Recognition

ConceptMapper is an open source tool we created for classifying mentions in an unstructured text document based on concept terminologies and yielding named entities as output. It is implemented as a UIMA 1 (Unstructured Information Management Architecture (IBM, 2004)) annotator, and concepts come from standardised or proprietary terminologies. ConceptMapper can be easily configured, for instance, to use different search strategies or syntactic concepts. In this paper we will describe ConceptMapper, its configuration parameters and their trade-offs, in terms of precision and recall in identifying concepts in a collection of clinical reports written in English. ConceptMapper is available from the Apache UIMA Sandbox, using the Apache Open Source license.

[1]  C. Muir,et al.  International Classification of Diseases for Oncology , 1990 .

[2]  Christopher G. Chute,et al.  Text Analysis Integration into a Medical Information Retrieval System: Challenges Related to Word Sense Disambiguation , 2007 .

[3]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[4]  James W. Cooper,et al.  Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model , 2009, J. Biomed. Informatics.

[5]  J Starren,et al.  Architectural requirements for a multipurpose natural language processor in the clinical environment. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[6]  Christopher G. Chute,et al.  Domain-specific language models and lexicons for tagging , 2005, J. Biomed. Informatics.

[7]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[8]  P. Trott,et al.  International Classification of Diseases for Oncology , 1977 .

[9]  Dietrich Rebholz-Schuhmann,et al.  Assessment of disease named entity recognition on a corpus of annotated sentences , 2008, BMC Bioinformatics.

[10]  Guergana K. Savova,et al.  System Evaluation on a Named Entity Corpus from Clinical Notes , 2008, LREC.

[11]  L. Sobin,et al.  International Classification of Diseases for Oncology, 3rd edition , 2000 .

[12]  Renata Vieira,et al.  A Corpus-based Investigation of Definite Description Use , 1997, CL.