论文信息 - Detecting abbreviations in discharge summaries using machine learning methods.

Detecting abbreviations in discharge summaries using machine learning methods.

Recognition and identification of abbreviations is an important, challenging task in clinical natural language processing (NLP). A comprehensive lexical resource comprised of all common, useful clinical abbreviations would have great applicability. The authors present a corpus-based method to create a lexical resource of clinical abbreviations using machine-learning (ML) methods, and tested its ability to automatically detect abbreviations from hospital discharge summaries. Domain experts manually annotated abbreviations in seventy discharge summaries, which were randomly broken into a training set (40 documents) and a test set (30 documents). We implemented and evaluated several ML algorithms using the training set and a list of pre-defined features. The subsequent evaluation using the test set showed that the Random Forest classifier had the highest F-measure of 94.8% (precision 98.8% and recall of 91.2%). When a voting scheme was used to combine output from various ML classifiers, the system achieved the highest F-measure of 95.7%.

[1] J. Sheppard,et al. Ambiguous abbreviations: an audit of abbreviations in paediatric note keeping , 2007, Archives of Disease in Childhood.

[2] J. Berman. Pathology abbreviated: a long review of short terms. , 2009, Archives of pathology & laboratory medicine.

[3] Maurice H. T. Ling,et al. BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature , 2009, BMC Bioinformatics.

[4] Jerry H Gurwitz,et al. Medical abbreviations: writing little and communicating less , 2008, Archives of Disease in Childhood.

[5] John F. Hurdle,et al. Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[6] Carol Friedman,et al. A Study of Abbreviations in Clinical Notes , 2007, AMIA.

[7] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[8] Hongfang Liu,et al. A study of abbreviations in the UMLS , 2001, AMIA.

[9] S Manzar,et al. Use of abbreviations in daily progress notes , 2004, Archives of Disease in Childhood - Fetal and Neonatal Edition.

[10] Hongfang Liu,et al. BioTagger-GM: a gene/protein name recognition system. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[11] Neil R. Smalheiser,et al. ADAM: another database of abbreviations in MEDLINE , 2006, Bioinform..

[12] Carol Friedman,et al. Research Paper: Methods for Building Sense Inventories of Abbreviations in Clinical Notes , 2009, J. Am. Medical Informatics Assoc..

[13] N Capaldi,et al. The paediatric hospital medical record: a quality assessment. , 1992, Australian clinical review.

[14] Amy Linsky,et al. A randomized-controlled trial of computerized alerts to reduce unapproved medication abbreviation use , 2011, J. Am. Medical Informatics Assoc..

[15] D. Roden,et al. Development of a Large‐Scale De‐Identified DNA Biobank to Enable Personalized Medicine , 2008, Clinical pharmacology and therapeutics.

[16] George Hripcsak,et al. The sublanguage of cross-coverage , 2002, AMIA.