Use of "off-the-shelf" information extraction algorithms in clinical informatics: A feasibility study of MetaMap annotation of Italian medical notes

Information extraction from narrative clinical notes is useful for patient care, as well as for secondary use of medical data, for research or clinical purposes. Many studies focused on information extraction from English clinical texts, but less dealt with clinical notes in languages other than English. This study tested the feasibility of using "off the shelf" information extraction algorithms to identify medical concepts from Italian clinical notes. Among all the available and well-established information extraction algorithms, we used MetaMap to map medical concepts to the Unified Medical Language System (UMLS). The study addressed two questions: (Q1) to understand if it would be possible to properly map medical terms found in clinical notes and related to the semantic group of "Disorders" to the Italian UMLS resources; (Q2) to investigate if it would be feasible to use MetaMap as it is to extract these medical concepts from Italian clinical notes. We performed three experiments: in EXP1, we investigated how many medical concepts of the "Disorders" semantic group found in a set of clinical notes written in Italian could be mapped to the UMLS Italian medical sources; in EXP2 we assessed how the different processing steps used by MetaMap, which are English dependent, could be used in Italian texts to map the original clinical notes on the Italian UMLS sources; in EXP3 we automatically translated the clinical notes from Italian to English using Google Translator, and then we used MetaMap to map the translated texts. Results in EXP1 showed that the Italian UMLS Metathesaurus sources covered 91% of the medical terms of the "Disorders" semantic group, as found in the studied dataset. We observed that even if MetaMap was built to analyze texts written in English, most of its processing steps worked properly also with texts written in Italian. MetaMap identified correctly about half of the concepts in the Italian clinical notes. Using MetaMap's annotation on Italian clinical notes instead of a simple text search improved our results of about 15 percentage points. MetaMap's annotation of Italian clinical notes showed recall, precision and F-measure equal to 0.53, 0.98 and 0.69, respectively. Most of the failures were due to the impossibility for MetaMap to generate meaningful variants for the Italian language, suggesting that modifying MetaMap to allow generating Italian variants could improve the performance. MetaMap's performance in annotating automatically translated English clinical notes was in line with findings in the literature, with similar recall (0.75), F-measure (0.83) and even higher precision (0.95). Most of the failures were due to a bad Italian to English translation of medical terms, suggesting that using an automatic translation tool specialized in translating medical concepts might be useful to obtain better performances. In conclusion, performances obtained using MetaMap on the fully automatic translation of the Italian text are good enough to allow to use MetaMap "as it is" in clinical practice.

[1]  Naomi Sager,et al.  Research Paper: Natural Language Processing and the Representation of Clinical Data , 1994, J. Am. Medical Informatics Assoc..

[2]  Belinda Seto,et al.  Workshop on using natural language processing applications for enhancing clinical decision making: an executive summary. , 2014, Journal of the American Medical Informatics Association : JAMIA.

[3]  Pierre Zweigenbaum,et al.  Extracting Medication Information from French Clinical Texts , 2010, MedInfo.

[4]  Paloma Martínez,et al.  Automatic identification of biomedical concepts in spanish-language unstructured clinical texts , 2010, IHI.

[5]  Peter J. Haug,et al.  Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation , 2006, J. Biomed. Informatics.

[6]  Peter J. Haug,et al.  Evaluation of Medical Problem Extraction from Electronic Clinical Documents Using MetaMap Transfer (MMTx) , 2005, MIE.

[7]  Nigam H. Shah,et al.  Mining clinical text for signals of adverse drug-drug interactions , 2014, J. Am. Medical Informatics Assoc..

[8]  George Hripcsak,et al.  Natural language processing in an operational clinical information system , 1995, Natural Language Engineering.

[9]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[10]  U Hahn,et al.  MorphoSaurus--design and evaluation of an interlingua-based, cross-language document retrieval engine for the medical domain. , 2005, Methods of information in medicine.

[11]  João Paulo Silva Cunha,et al.  Medical information extraction in European Portuguese , 2013 .

[12]  Zhiyong Lu,et al.  Challenges in clinical natural language processing for automated disorder normalization , 2015, J. Biomed. Informatics.

[13]  Pierre Zweigenbaum,et al.  MetaCoDe: A Lightweight UMLS Mapping Tool , 2007, AIME.

[14]  Cui Tao,et al.  Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis , 2012, J. Am. Medical Informatics Assoc..

[15]  David Martínez,et al.  Evaluating the state of the art in disorder recognition and normalization of the clinical narrative , 2014, J. Am. Medical Informatics Assoc..

[16]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[17]  Paul Buitelaar,et al.  Semantic annotation for concept-based cross-language medical information retrieval , 2002, Int. J. Medical Informatics.

[18]  Jina Huh,et al.  Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text , 2015, Journal of medical Internet research.

[19]  Naren Ramakrishnan,et al.  Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis , 2014, J. Am. Medical Informatics Assoc..

[20]  Xiaoyan Wang,et al.  Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[21]  Wanda Pratt,et al.  A Study of Biomedical Concept Identification: MetaMap vs. People , 2003, AMIA.

[22]  Charles Sneiderman,et al.  Argument identification for arterial branching predications asserted in cardiac catheterization reports , 2000, AMIA.

[23]  Guo-Qiang Zhang,et al.  Complex epilepsy phenotype extraction from narrative clinical discharge summaries , 2014, J. Biomed. Informatics.

[24]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[25]  Koldo Gojenola,et al.  Automatic Annotation of Medical Records in Spanish with Disease, Drug and Substance Names , 2013, CIARP.

[26]  Dina Demner-Fushman,et al.  UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text , 2010, J. Biomed. Informatics.

[27]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[28]  Olivier Bodenreider,et al.  Exploring semantic groups through visual approaches , 2003, J. Biomed. Informatics.

[29]  José Carlos Cortizo,et al.  Building a Spanish MMTx by Using Automatic Translation and Biomedical Ontologies , 2008, IDEAL.

[30]  Wendy W. Chapman,et al.  Identifying Respiratory Findings in Emergency Department Reports for Biosurveillance using MetaMap , 2004, MedInfo.

[31]  Maria Kvist,et al.  Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study , 2014, J. Biomed. Informatics.

[32]  Cédrick Fairon,et al.  Annotation analysis for testing drug safety signals using unstructured clinical notes , 2012, J. Biomed. Semant..

[33]  Guy Divita,et al.  Failure Analysis of MetaMap Transfer (MMTx) , 2004, MedInfo.

[34]  José Carlos Cortizo,et al.  In the development of a spanish metamap , 2008, CIKM '08.

[35]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[36]  Carol Friedman,et al.  Towards a comprehensive medical language processing system: methods and issues , 1997, AMIA.

[37]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[38]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[39]  Alan R. Aronson,et al.  Towards linking patients and clinical information: detecting UMLS concepts in e-mail , 2003, J. Biomed. Informatics.