Benchmarking Clinical Speech Recognition and Information Extraction: New Data, Methods, and Evaluations

Background Over a tenth of preventable adverse events in health care are caused by failures in information flow. These failures are tangible in clinical handover; regardless of good verbal handover, from two-thirds to all of this information is lost after 3-5 shifts if notes are taken by hand, or not at all. Speech recognition and information extraction provide a way to fill out a handover form for clinical proofing and sign-off. Objective The objective of the study was to provide a recorded spoken handover, annotated verbatim transcriptions, and evaluations to support research in spoken and written natural language processing for filling out a clinical handover form. This dataset is based on synthetic patient profiles, thereby avoiding ethical and legal restrictions, while maintaining efficacy for research in speech-to-text conversion and information extraction, based on realistic clinical scenarios. We also introduce a Web app to demonstrate the system design and workflow. Methods We experiment with Dragon Medical 11.0 for speech recognition and CRF++ for information extraction. To compute features for information extraction, we also apply CoreNLP, MetaMap, and Ontoserver. Our evaluation uses cross-validation techniques to measure processing correctness. Results The data provided were a simulation of nursing handover, as recorded using a mobile device, built from simulated patient records and handover scripts, spoken by an Australian registered nurse. Speech recognition recognized 5276 of 7277 words in our 100 test documents correctly. We considered 50 mutually exclusive categories in information extraction and achieved the F1 (ie, the harmonic mean of Precision and Recall) of 0.86 in the category for irrelevant text and the macro-averaged F1 of 0.70 over the remaining 35 nonempty categories of the form in our 101 test documents. Conclusions The significance of this study hinges on opening our data, together with the related performance benchmarks and some processing software, to the research and development community for studying clinical documentation and language-processing. The data are used in the CLEFeHealth 2015 evaluation laboratory for a shared task on speech recognition.

[1]  Noémie Elhadad,et al.  Natural Language Processing in Health Care and Biomedicine , 2014 .

[2]  Steve G. Langer Impact of Speech Recognition on Radiologist Productivity , 2002, Journal of Digital Imaging.

[3]  Dominique Estival,et al.  A usability framework for speech recognition technologies in clinical handover: A pre-implementation study , 2014, Journal of Medical Systems.

[4]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[5]  Laura Banner,et al.  Automated Clinical Documentation: Does It Allow Nurses More Time for Patient Care? , 2009, Computers, informatics, nursing : CIN.

[6]  D. T. Tran miph,et al.  Classifying nursing errors in clinical management within an Australian hospital , 2010 .

[7]  R. G. Zick,et al.  Voice recognition software versus a traditional transcription service for physician charting in the ED. , 2001, The American journal of emergency medicine.

[8]  Mike Kuniavsky,et al.  Observing the User Experience: A Practitioner's Guide to User Research (Second Edition) , 2013, IEEE Transactions on Professional Communication.

[9]  Douglas P. Beall,et al.  Speech recognition interface to a hospital information system using a self-designed visual basic program: Initial experience , 2010, Journal of Digital Imaging.

[10]  Hanna Suominen,et al.  Text mining and information analysis of health documents , 2014, Artif. Intell. Medicine.

[11]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[12]  Hitoshi Iida,et al.  A Method for Correcting Errors in Speech Recognition Using the Statistical Features of Character Co-occurence , 1998, COLING-ACL.

[13]  Gabriela Ferraro,et al.  Noise in Speech-to-Text Voice: Analysis of Errors and Feasibility of Phonetic Similarity for Their Correction , 2013, ALTA.

[14]  Linda Dawson,et al.  A systematic review of speech recognition technology in health care , 2014, BMC Medical Informatics and Decision Making.

[15]  Hanna Suominen Performance Evaluation Measures for Text Mining , 2009 .

[16]  Prakash M. Nadkarni,et al.  Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions , 2011, J. Am. Medical Informatics Assoc..

[17]  Lynette Hirschman,et al.  Overview of evaluation in speech and natural language processing , 1997 .

[18]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[19]  R. Patton American Nurses Association , 2007, Disaster Medicine and Public Health Preparedness.

[20]  Manas A. Pathak,et al.  Privacy-Preserving Machine Learning for Speech Processing , 2012 .

[21]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[22]  Brian Hakes,et al.  Assessing the Impact of an Electronic Medical Record on Nurse Documentation Time , 2008, Computers, informatics, nursing : CIN.

[23]  Tomi Kauppinen,et al.  Improvement of Report Workflow and Productivity Using Speech Recognition—A Follow-up Study , 2008, Journal of Digital Imaging.

[24]  K. Bretonnel Cohen,et al.  Frontiers of biomedical text mining: current progress , 2007, Briefings Bioinform..

[25]  Ken Compton Factors Affecting Cathode Ray Tube Display Performance , 2001, Journal of Digital Imaging.

[26]  Dominique Estival,et al.  Capturing patient information at nursing shift changes: methodological evaluation of speech recognition and information extraction , 2015, J. Am. Medical Informatics Assoc..

[27]  Jenelle Matic,et al.  Review: bringing patient safety to the forefront through structured computerisation during clinical handover. , 2011, Journal of clinical nursing.

[28]  S. Glaser,et al.  Measuring and Interpreting Organizational Culture , 1987 .

[29]  José Orlando Gomes,et al.  Handoff strategies in settings with high consequences for failure: lessons for health care operations. , 2004, International journal for quality in health care : journal of the International Society for Quality in Health Care.

[30]  Luk Arbuckle,et al.  El Emam Et Al.: the De‐identification of the Heritage Health Prize Claims Data Set Multimedia Appendix Multimedia Appendix 1 Truncation of Claims 2 Removal of High Risk Patients , 2022 .

[31]  Enrico Coiera,et al.  Learning from Hackers: Open-Source Clinical Trials , 2012, Science Translational Medicine.

[32]  P. Mermelstein,et al.  Distance measures for speech recognition, psychological and instrumental , 1976 .

[33]  Benno Stein,et al.  Information Access Evaluation. Multilinguality, Multimodality, and Visualization , 2013, Lecture Notes in Computer Science.

[34]  D. Pothier,et al.  Pilot study to show the loss of important data in nursing handover. , 2005, British journal of nursing.

[35]  Meenakshi Singh,et al.  Voice recognition technology implementation in surgical pathology: advantages and limitations. , 2011, Archives of pathology & laboratory medicine.

[36]  Arthur C. Curtis,et al.  Technology Evaluation: Comparative Evaluation of Three Continuous Speech Recognition Software Packages in the Generation of Medical Reports , 2000, J. Am. Medical Informatics Assoc..

[37]  Tomi Kauppinen,et al.  Improvement of Report Workflow and Productivity Using Speech Recognition—A Follow-up Study , 2008, Journal of Digital Imaging.

[38]  Maamoun M Al-Aynati,et al.  Comparison of voice-automated transcription and human transcription in generating pathology reports. , 2003, Archives of pathology & laboratory medicine.

[39]  D. Altman,et al.  Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers , 2010, BMJ : British Medical Journal.

[40]  M. Calvet [The patient record]. , 1982, L' Information dentaire.

[41]  Philip V. Ogren,et al.  Knowtator: A Protégé plug-in for annotated corpus construction , 2006, NAACL.

[42]  Peter Szolovits,et al.  Automated de-identification of free-text medical records , 2008, BMC Medical Informatics Decis. Mak..

[43]  Lena Vogler Survey Of The State Of The Art In Human Language Technology , 2016 .

[44]  Tomoko Ohkuma,et al.  Overview of the NTCIR-10 MedNLP Task , 2013, NTCIR.

[45]  Nigam H. Shah,et al.  Building the graph of medicine from millions of clinical narratives , 2014, Scientific Data.

[46]  Clement J. McDonald,et al.  Application of Information Technology: A Software Tool for Removing Patient Identifying Information from Clinical Documents , 2008, J. Am. Medical Informatics Assoc..

[47]  Gary Geunbae Lee,et al.  Using Higher-level Linguistic Knowledge for Speech Recognition Error Correction in a Spoken Q/A Dialog , 2004, HLT-NAACL 2004.

[48]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[49]  Carol Friedman,et al.  Natural language processing: State of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine , 2013, J. Biomed. Informatics.

[50]  Ch Chen,et al.  Pattern recognition and artificial intelligence , 1976 .

[51]  Lucila Ohno-Machado,et al.  Natural language processing: an introduction , 2011, J. Am. Medical Informatics Assoc..

[52]  J. Basilakis,et al.  Comparing nursing handover and documentation: forming one set of patient information. , 2014, International nursing review.

[53]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[54]  Hitoshi Iida,et al.  A Method for Correcting Errors in Speech Recognition Using the Statistical Features of Character Co-occurrence , 2022, International Conference on Computational Linguistics.

[55]  Robyn Tamblyn,et al.  Review Paper: The Impact of Electronic Health Records on Time Efficiency of Physicians and Nurses: A Systematic Review , 2005, J. Am. Medical Informatics Assoc..