Statistical semantic and clinician confidence analysis for correcting abbreviations and spelling errors in clinical progress notes

MOTIVATION Progress notes are narrative summaries about the status of patients during the course of treatment or care. Time and efficiency pressures have ensured clinicians' continued preference for unstructured text over entering data in forms when composing progress notes. The ability to extract meaningful data from the unstructured text contained within the notes is invaluable for retrospective analysis and decision support. The automatic extraction of data from unstructured notes, however, has been largely prevented due to the complexity of handling abbreviations, misspelling, punctuation errors and other types of noise. OBJECTIVE We present a robust system for cleaning noisy progress notes in real-time, with a focus on abbreviations and misspellings. METHODS The system uses statistical semantic analysis based on Web data and the occasional participation of clinicians to automatically replace abbreviations with the actual senses and misspellings with the correct words. RESULTS An accuracy of as high as 88.73% was achieved based only on statistical semantic analysis using Web data. The response time of the system with the caching mechanism enabled is 1.5-2s per word which is about the same as the average typing speed of clinicians. CONCLUSIONS The overall accuracy and the response time of the system will improve with time, especially when the confidence mechanism is activated through clinicians' interactions with the system. This system will be implemented in a clinical information system to drive interactive decision support and analysis functions leading to improved patient care and outcomes.

[1]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[2]  Matthias Dehmer,et al.  Information Theory and Statistical Learning , 2010 .

[3]  Ming Li,et al.  Normalized Information Distance , 2008, ArXiv.

[4]  Yefeng Wang,et al.  Annotating and Recognising Named Entities in Clinical Notes , 2009, ACL.

[5]  Mark Stevenson,et al.  Disambiguation of Biomedical Abbreviations , 2009, BioNLP@HLT-NAACL.

[6]  Serguei V. S. Pakhomov Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts , 2002, ACL.

[7]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[8]  Alan L. Rector,et al.  MEDICAL INFORMATICS , 1990, The Lancet.

[9]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[10]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[11]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[12]  J. Marc Overhage,et al.  Using Natural Language Processing to Improve Accuracy of Automated Notifiable Disease Reporting , 2008, AMIA.

[13]  Neil R. Smalheiser,et al.  ADAM: another database of abbreviations in MEDLINE , 2006, Bioinform..

[14]  Brian Randell,et al.  An Assessment of Name Matching Algorithms , 1996 .

[15]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[16]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[17]  Russ B. Altman,et al.  Research Paper: Creating an Online Dictionary of Abbreviations from MEDLINE , 2002, J. Am. Medical Informatics Assoc..

[18]  M. Dawes,et al.  Knowing we practise good medicine: implementing the electronic medical record in family practice. , 2010, Canadian family physician Medecin de famille canadien.

[19]  Carol Friedman,et al.  Research Paper: Methods for Building Sense Inventories of Abbreviations in Clinical Notes , 2009, J. Am. Medical Informatics Assoc..

[20]  John R. Gilbertson,et al.  Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. , 2004 .

[21]  David O. Holmes,et al.  Improving precision and recall for Soundex retrieval , 2002, Proceedings. International Conference on Information Technology: Coding and Computing.

[22]  Mathieu Roche,et al.  Managing the Acronym/Expansion Identification Process for Text-Mining Applications , 2008, Int. J. Softw. Informatics.

[23]  Ted Pedersen,et al.  Abbreviation and Acronym Disambiguation in Clinical Discourse , 2005, AMIA.

[24]  C. Friedman Semantic Text Parsing for Patient Records , 2005 .

[25]  L. Philips,et al.  Hanging on the metaphone , 1990 .

[26]  Carol Friedman,et al.  A Study of Abbreviations in Clinical Notes , 2007, AMIA.

[27]  Antoine Geissbühler,et al.  Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record , 2003, Artif. Intell. Medicine.

[28]  Paul Douglas,et al.  International Conference on Information Technology : Coding and Computing , 2003 .