Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program

The UMLS Metathesaurus, the largest thesaurus in the biomedical domain, provides a representation of biomedical knowledge consisting of concepts classified by semantic type and both hierarchical and non-hierarchical relationships among the concepts. This knowledge has proved useful for many applications including decision support systems, management of patient records, information retrieval (IR) and data mining. Gaining effective access to the knowledge is critical to the success of these applications. This paper describes MetaMap, a program developed at the National Library of Medicine (NLM) to map biomedical text to the Metathesaurus or, equivalently, to discover Metathesaurus concepts referred to in text. MetaMap uses a knowledge intensive approach based on symbolic, natural language processing (NLP) and computational linguistic techniques. Besides being applied for both IR and data mining applications, MetaMap is one of the foundations of NLM's Indexing Initiative System which is being applied to both semi-automatic and fully automatic indexing of the biomedical literature at the library.

[1]  P. Srinivasan Retrieval feedback in MEDLINE. , 1996, Journal of the American Medical Informatics Association : JAMIA.

[2]  Thomas C. Rindflesch,et al.  Query Expansion Using the UMLS ® Metathesaurus ® , 1997 .

[3]  Charles Sneiderman,et al.  Identification of anatomical terminology in medical text , 1998, AMIA.

[4]  Olivier Bodenreider,et al.  The NLM Indexing Initiative , 2000, AMIA.

[5]  Alan R. Aronson,et al.  Exploiting a Large Thesaurus for Information Retrieval , 1994, RIAO.

[6]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[7]  Allen C. Browne,et al.  Lexical methods for managing variation in biomedical terminologies. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[8]  T C Rindflesch,et al.  Ambiguity resolution while mapping free text to the UMLS Metathesaurus. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[9]  Thomas C. Rindflesch,et al.  Hierarchical concept indexing of full-text documents in the Unified Medical Language System information sources map , 1999 .

[10]  Susanne M. Humphrey,et al.  Automatic Indexing of Documents from Journal Descriptors: A Preliminary Investigation , 1999, J. Am. Soc. Inf. Sci..

[11]  A R Aronson,et al.  The effect of textual variation on concept based information retrieval. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[12]  T C Rindflesch,et al.  Semantic processing in information retrieval. , 1993, Proceedings. Symposium on Computer Applications in Medical Care.

[13]  Peter L. Elkin,et al.  UMLS Concept Indexing for Production Databases: A Feasibility Study , 2001, J. Am. Medical Informatics Assoc..

[14]  Wanda Pratt,et al.  QueryCat: automatic categorization of MEDLINE queries , 2000, AMIA.

[15]  W. G. Cole,et al.  Metaphrase: An Aid to the Clinical Conceptualization and Formalization of Patient Problems in Healthcare Enterprises , 1998, Methods of Information in Medicine.

[16]  Padmini Srinivasan,et al.  Query Expansion and MEDLINE , 1996, Inf. Process. Manag..

[17]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[18]  C A Sneiderman,et al.  Finding the findings: identification of findings in medical literature using restricted natural language processing. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[19]  Charles Sneiderman,et al.  Argument identification for arterial branching predications asserted in cardiac catheterization reports , 2000, AMIA.

[20]  Allen C. Browne,et al.  Analysis of biomedical text for chemical names: a comparison of three methods , 1999, AMIA.

[21]  Marc Weeber,et al.  Text-based discovery in biomedicine: the architecture of the DAD-system , 2000, AMIA.

[22]  Mary Hart,et al.  Automatic indexing using selective NLP and first-order thesauri , 1991, RIAO.

[23]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[24]  G F Cooper,et al.  CHARTLINE: providing bibliographic references relevant to patient charts using the UMLS Metathesaurus Knowledge Sources. , 1992, Proceedings. Symposium on Computer Applications in Medical Care.

[25]  Thomas H. Payne,et al.  Mapping to MeSH: The Art of Trapping MeSH Equivalence from within Narrative Text , 1988 .

[26]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[27]  Lawrence Hunter,et al.  Mining molecular binding terminology from biomedical text , 1999, AMIA.

[28]  Thomas C. Rindflesch,et al.  Hierarchical Concept Indexing of Full-Text Documents in the Unified Medical Language System® Information Sources Map , 1999, J. Am. Soc. Inf. Sci..

[29]  William R. Hersh,et al.  Research Paper: A Performance and Failure Analysis of SAPHIRE with a MEDLINE Test Collection , 1994, J. Am. Medical Informatics Assoc..

[30]  W Hersh,et al.  The SAPHIRE server: a new algorithm and implementation. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.