Biomedical term normalization of EHRs with UMLS

This paper presents a novel prototype for biomedical term normalization of electronic health record excerpts with the Unified Medical Language System (UMLS) Metathesaurus. Despite being multilingual and cross-lingual by design, we first focus on processing clinical text in Spanish because there is no existing tool for this language and for this specific purpose. The tool is based on Apache Lucene to index the Metathesaurus and generate mapping candidates from input text. It uses the IXA pipeline for basic language processing and resolves ambiguities with the UKB toolkit. It has been evaluated by measuring its agreement with MetaMap in two English-Spanish parallel corpora. In addition, we present a web-based interface for the tool.

[1]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[2]  Mariana L. Neves,et al.  The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine , 2016, LREC.

[3]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[4]  Peter J. Haug,et al.  Evaluation of Medical Problem Extraction from Electronic Clinical Documents Using MetaMap Transfer (MMTx) , 2005, MIE.

[5]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[6]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[7]  Eneko Agirre,et al.  Exploiting domain information for Word Sense Disambiguation of medical documents , 2011, J. Am. Medical Informatics Assoc..

[8]  Carol Friedman,et al.  A broad-coverage natural language processing system , 2000, AMIA.

[9]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[10]  Daniel L. Rubin,et al.  Comparison of concept recognizers for building the Open Biomedical Annotator , 2009, BMC Bioinformatics.

[11]  Paloma Martínez,et al.  Automatic identification of biomedical concepts in spanish-language unstructured clinical texts , 2010, IHI.

[12]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[13]  Koldo Gojenola,et al.  Automatic Annotation of Medical Records in Spanish with Disease, Drug and Substance Names , 2013, CIARP.

[14]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[15]  German Rigau,et al.  IXA pipeline: Efficient and Ready to Use Multilingual NLP tools , 2014, LREC.

[16]  José Carlos Cortizo,et al.  Building a Spanish MMTx by Using Automatic Translation and Biomedical Ontologies , 2008, IDEAL.

[17]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[18]  Eneko Agirre,et al.  Graph-based Word Sense Disambiguation of biomedical documents , 2010, Bioinform..

[19]  José Carlos Cortizo,et al.  In the development of a spanish metamap , 2008, CIKM '08.

[20]  Werner Ceusters,et al.  Negative findings in electronic health records and biomedical ontologies: A realist approach , 2007, Int. J. Medical Informatics.