Interactive NLP in Clinical Care: Identifying Incidental Findings in Radiology Reports

BACKGROUND  Despite advances in natural language processing (NLP), extracting information from clinical text is expensive. Interactive tools that are capable of easing the construction, review, and revision of NLP models can reduce this cost and improve the utility of clinical reports for clinical and secondary use. OBJECTIVES  We present the design and implementation of an interactive NLP tool for identifying incidental findings in radiology reports, along with a user study evaluating the performance and usability of the tool. METHODS  Expert reviewers provided gold standard annotations for 130 patient encounters (694 reports) at sentence, section, and report levels. We performed a user study with 15 physicians to evaluate the accuracy and usability of our tool. Participants reviewed encounters split into intervention (with predictions) and control conditions (no predictions). We measured changes in model performance, the time spent, and the number of user actions needed. The System Usability Scale (SUS) and an open-ended questionnaire were used to assess usability. RESULTS  Starting from bootstrapped models trained on 6 patient encounters, we observed an average increase in F1 score from 0.31 to 0.75 for reports, from 0.32 to 0.68 for sections, and from 0.22 to 0.60 for sentences on a held-out test data set, over an hour-long study session. We found that tool helped significantly reduce the time spent in reviewing encounters (134.30 vs. 148.44 seconds in intervention and control, respectively), while maintaining overall quality of labels as measured against the gold standard. The tool was well received by the study participants with a very good overall SUS score of 78.67. CONCLUSION  The user study demonstrated successful use of the tool by physicians for identifying incidental findings. These results support the viability of adopting interactive NLP tools in clinical care settings for a wider range of clinical applications.

[1]  Hua Xu,et al.  Research and applications: Assisted annotation of medical free text using RapTAT , 2014, J. Am. Medical Informatics Assoc..

[2]  Christine D. Piatko,et al.  Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[3]  Yue Wang,et al.  Interactive medical word sense disambiguation through informed learning , 2018, J. Am. Medical Informatics Assoc..

[4]  Aniket Kittur,et al.  Apolo: making sense of large network data by combining rich user interaction and machine learning , 2011, CHI.

[5]  Philip V. Ogren,et al.  Knowtator: A Protégé plug-in for annotated corpus construction , 2006, NAACL.

[6]  Anastasia Bezerianos,et al.  Evaluation of Interactive Machine Learning Systems , 2018, Human and Machine Learning.

[7]  Weng-Keen Wong,et al.  Principles of Explanatory Debugging to Personalize Interactive Machine Learning , 2015, IUI.

[8]  Ian H. Witten,et al.  Interactive machine learning: letting users build classifiers , 2002, Int. J. Hum. Comput. Stud..

[9]  Maya Cakmak,et al.  Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[10]  Qing Zeng-Treitler,et al.  Automated alerts and reminders targeting patients: A review of the literature. , 2016, Patient education and counseling.

[11]  Hongfang Liu,et al.  CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines , 2017, J. Am. Medical Informatics Assoc..

[12]  Charles P. Friedman,et al.  Evaluation Methods in Biomedical Informatics (Health Informatics) , 2005 .

[13]  Jun'ichi Tsujii,et al.  Named entity recognition of follow-up and time information in 20 000 radiology reports , 2012, J. Am. Medical Informatics Assoc..

[14]  E. Mohammadi,et al.  Barriers and facilitators related to the implementation of a physiological track and trigger system: A systematic review of the qualitative evidence , 2017, International journal for quality in health care : journal of the International Society for Quality in Health Care.

[15]  Perry R. Cook,et al.  Human model evaluation in interactive supervised learning , 2011, CHI.

[16]  Melissa K. James,et al.  Incidental findings in blunt trauma patients: prevalence, follow-up documentation, and risk factors , 2017, Emergency Radiology.

[17]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[18]  Dario A. Giuse,et al.  Development and evaluation of RapTAT: A machine learning system for concept mapping of phrases from medical narratives , 2014, J. Biomed. Informatics.

[19]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[20]  Maya Cakmak,et al.  Optimality of human teachers for robot learners , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[21]  B. Lumbreras,et al.  Incidental findings in imaging diagnostic tests: a systematic review. , 2010, The British journal of radiology.

[22]  Harry Hochheiser,et al.  NLPReViz: an interactive tool for natural language processing on clinical text , 2018, J. Am. Medical Informatics Assoc..

[23]  Meliha Yetisgen-Yildiz,et al.  Annotation of Clinically Important Follow-up Recommendations in Radiology Reports , 2015, Louhi@EMNLP.

[24]  Ali Salim,et al.  Whole body imaging in blunt multisystem trauma patients without obvious signs of injury: results of a prospective study. , 2006, Archives of surgery.

[25]  J. Gilbertson,et al.  Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. , 2004, American journal of clinical pathology.

[26]  Steven Bethard,et al.  Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning , 2016, J. Am. Medical Informatics Assoc..

[27]  Dimitrios Mitsouras,et al.  Natural Language Processing Technologies in Radiology Research and Clinical Applications. , 2016, Radiographics : a review publication of the Radiological Society of North America, Inc.

[28]  Jaegul Choo,et al.  UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[29]  John R. Gilbertson,et al.  Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. , 2004 .

[30]  Prakash M. Nadkarni,et al.  Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions , 2011, J. Am. Medical Informatics Assoc..

[31]  Desney S. Tan,et al.  Effective End-User Interaction with Machine Learning , 2011, AAAI.

[32]  Sergey Goryachev,et al.  Automated concept-level information extraction to reduce the need for custom software and rules development , 2011, J. Am. Medical Informatics Assoc..

[33]  Johannes B Reitsma,et al.  Overdiagnosis across medical disciplines: a scoping review , 2017, BMJ Open.

[34]  Paul N. Bennett,et al.  Guidelines for Human-AI Interaction , 2019, CHI.

[35]  Thomas Ertl,et al.  Visual Classifier Training for Text Document Retrieval , 2012, IEEE Transactions on Visualization and Computer Graphics.

[36]  Loes M. M. Braun,et al.  Natural Language Processing in Radiology: A Systematic Review. , 2016, Radiology.

[37]  Jeffrey Heer,et al.  Interpretation and trust: designing model-driven visualizations for text analysis , 2012, CHI.

[38]  K. Borgwardt,et al.  Machine Learning in Medicine , 2015, Mach. Learn. under Resour. Constraints Vol. 3.

[39]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[40]  Marcus A. Badgeley,et al.  Natural Language-based Machine Learning Models for the Annotation of Clinical Radiology Reports. , 2018, Radiology.

[41]  E. Alpern,et al.  Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement , 2016, Applied Clinical Informatics.

[42]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[43]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[44]  Shyam Visweswaran,et al.  Identifying incidental findings from radiology reports of trauma patients: An evaluation of automated feature representation methods , 2019, Int. J. Medical Informatics.

[45]  Fei Xia,et al.  Automatic identification of critical follow-up recommendation sentences in radiology reports. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[46]  Mark Johnson,et al.  An Improved Non-monotonic Transition System for Dependency Parsing , 2015, EMNLP.

[47]  A. Peitzman,et al.  Incidental radiographic findings after injury: dedicated attention results in improved capture, documentation, and management. , 2010, Surgery.

[48]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[49]  Shervin Malmasi,et al.  Canary: An NLP Platform for Clinicians and Researchers , 2017, Applied Clinical Informatics.

[50]  Jeffrey Heer,et al.  Agency plus automation: Designing artificial intelligence into interactive systems , 2019, Proceedings of the National Academy of Sciences.

[51]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..