Detecting abbreviations in discharge summaries using machine learning methods.

Recognition and identification of abbreviations is an important, challenging task in clinical natural language processing (NLP). A comprehensive lexical resource comprised of all common, useful clinical abbreviations would have great applicability. The authors present a corpus-based method to create a lexical resource of clinical abbreviations using machine-learning (ML) methods, and tested its ability to automatically detect abbreviations from hospital discharge summaries. Domain experts manually annotated abbreviations in seventy discharge summaries, which were randomly broken into a training set (40 documents) and a test set (30 documents). We implemented and evaluated several ML algorithms using the training set and a list of pre-defined features. The subsequent evaluation using the test set showed that the Random Forest classifier had the highest F-measure of 94.8% (precision 98.8% and recall of 91.2%). When a voting scheme was used to combine output from various ML classifiers, the system achieved the highest F-measure of 95.7%.

[1]  J. Sheppard,et al.  Ambiguous abbreviations: an audit of abbreviations in paediatric note keeping , 2007, Archives of Disease in Childhood.

[2]  J. Berman Pathology abbreviated: a long review of short terms. , 2009, Archives of pathology & laboratory medicine.

[3]  Maurice H. T. Ling,et al.  BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature , 2009, BMC Bioinformatics.

[4]  Jerry H Gurwitz,et al.  Medical abbreviations: writing little and communicating less , 2008, Archives of Disease in Childhood.

[5]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[6]  Carol Friedman,et al.  A Study of Abbreviations in Clinical Notes , 2007, AMIA.

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[8]  Hongfang Liu,et al.  A study of abbreviations in the UMLS , 2001, AMIA.

[9]  S Manzar,et al.  Use of abbreviations in daily progress notes , 2004, Archives of Disease in Childhood - Fetal and Neonatal Edition.

[10]  Hongfang Liu,et al.  BioTagger-GM: a gene/protein name recognition system. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[11]  Neil R. Smalheiser,et al.  ADAM: another database of abbreviations in MEDLINE , 2006, Bioinform..

[12]  Carol Friedman,et al.  Research Paper: Methods for Building Sense Inventories of Abbreviations in Clinical Notes , 2009, J. Am. Medical Informatics Assoc..

[13]  N Capaldi,et al.  The paediatric hospital medical record: a quality assessment. , 1992, Australian clinical review.

[14]  Amy Linsky,et al.  A randomized-controlled trial of computerized alerts to reduce unapproved medication abbreviation use , 2011, J. Am. Medical Informatics Assoc..

[15]  D. Roden,et al.  Development of a Large‐Scale De‐Identified DNA Biobank to Enable Personalized Medicine , 2008, Clinical pharmacology and therapeutics.

[16]  George Hripcsak,et al.  The sublanguage of cross-coverage , 2002, AMIA.