Clinical Acronym/Abbreviation Normalization using a Hybrid Approach

A unique characteristic of clinical text is the pervasive use of acronyms and abbreviations, which are often ambiguous. The ShARe/CLEF eHealth Evaluation Lab organized three shared tasks on clinical natural language processing (NLP) and information retrieval (IR) in 2013 and one of them was to normalize acronyms/abbreviations to UMLS concept unique identifiers (CUIs). This paper describes a hybrid system, which combines different Word Sense Disambiguation (WSD) methods and existing knowledge bases to normalize and encode clinical abbreviations. Our system achieved the best accuracy of 0.719 on the independent test set, which was ranked first in the challenge.

[1]  Hua Xu,et al.  Detecting abbreviations in discharge summaries using machine learning methods. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[2]  Martijn J. Schuemie,et al.  Word Sense Disambiguation in the Biomedical Domain: An Overview , 2005, J. Comput. Biol..

[3]  Youngjun Kim,et al.  Using UMLS lexical resources to disambiguate abbreviations in clinical text. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[4]  Hongfang Liu,et al.  Research Paper: A Multi-aspect Comparison Study of Supervised Word Sense Disambiguation , 2004, J. Am. Medical Informatics Assoc..

[5]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[6]  George Hripcsak,et al.  The sublanguage of cross-coverage , 2002, AMIA.

[7]  Sanna Salanterä,et al.  Overview of the ShARe/CLEF eHealth Evaluation Lab 2013 , 2013, CLEF.

[8]  Hua Xu,et al.  A comparative study of current clinical natural language processing systems on handling abbreviations in discharge summaries , 2012, AMIA.

[9]  Neil R. Smalheiser,et al.  ADAM: another database of abbreviations in MEDLINE , 2006, Bioinform..

[10]  Carol Friedman,et al.  A Study of Abbreviations in Clinical Notes , 2007, AMIA.

[11]  Ted Pedersen,et al.  Abbreviation and Acronym Disambiguation in Clinical Discourse , 2005, AMIA.

[12]  Carol Friedman,et al.  Combining Corpus-derived Sense Profiles with Estimated Frequency Information to Disambiguate Clinical Abbreviations , 2012, AMIA.

[13]  Hongfang Liu,et al.  A study of abbreviations in the UMLS , 2001, AMIA.

[14]  Carol Friedman,et al.  A new clustering method for detecting rare senses of abbreviations in clinical notes , 2012, J. Biomed. Informatics.

[15]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[16]  Serguei V. S. Pakhomov,et al.  Automated Disambiguation of Acronyms and Abbreviations in Clinical Texts: Window and Training Size Considerations , 2012, AMIA.

[17]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[18]  Joshua C. Denny,et al.  The KnowledgeMap Project: Development of a Concept-Based Medical School Curriculum Database , 2003, AMIA.

[19]  J. Berman Pathology abbreviated: a long review of short terms. , 2009, Archives of pathology & laboratory medicine.