Between Always and Never: Evaluating Uncertainty in Radiology Reports Using Natural Language Processing

The ideal radiology report reduces diagnostic uncertainty, while avoiding ambiguity whenever possible. The purpose of this study was to characterize the use of uncertainty terms in radiology reports at a single institution and compare the use of these terms across imaging modalities, anatomic sections, patient characteristics, and radiologist characteristics. We hypothesized that there would be variability among radiologists and between subspecialities within radiology regarding the use of uncertainty terms and that the length of the impression of a report would be a predictor of use of uncertainty terms. Finally, we hypothesized that use of uncertainty terms would often be interpreted by human readers as “hedging.” To test these hypotheses, we applied a natural language processing (NLP) algorithm to assess and count the number of uncertainty terms within radiology reports. An algorithm was created to detect usage of a published set of uncertainty terms. All 642,569 radiology report impressions from 171 reporting radiologists were collected from 2011 through 2015. For validation, two radiologists without knowledge of the software algorithm reviewed report impressions and were asked to determine whether the report was “uncertain” or “hedging.” The relationship between the presence of 1 or more uncertainty terms and the human readers’ assessment was compared. There were significant differences in the proportion of reports containing uncertainty terms across patient admission status and across anatomic imaging subsections. Reports with uncertainty were significantly longer than those without, although report length was not significantly different between subspecialities or modalities. There were no significant differences in rates of uncertainty when comparing the experience of the attending radiologist. When compared with reader 1 as a gold standard, accuracy was 0.91, sensitivity was 0.92, specificity was 0.9, and precision was 0.88, with an F1-score of 0.9. When compared with reader 2, accuracy was 0.84, sensitivity was 0.88, specificity was 0.82, and precision was 0.68, with an F1-score of 0.77. Substantial variability exists among radiologists and subspecialities regarding the use of uncertainty terms, and this variability cannot be explained by years of radiologist experience or differences in proportions of specific modalities. Furthermore, detection of uncertainty terms demonstrates good test characteristics for predicting human readers’ assessment of uncertainty.

[1]  Daniel L. Rubin,et al.  Evaluation of Negation and Uncertainty Detection and its Impact on Precision and Recall in Search , 2009, Journal of Digital Imaging.

[2]  Dean F. Sittig,et al.  Defining and Measuring Diagnostic Uncertainty in Medicine: A Systematic Review , 2017, Journal of General Internal Medicine.

[3]  J. Austin,et al.  Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. , 2002, Radiology.

[4]  Michael D. Reis,et al.  Types and origins of diagnostic errors in primary care settings. , 2013, JAMA internal medicine.

[5]  Hardeep Singh,et al.  The challenges in defining and measuring diagnostic error , 2015, Diagnosis.

[6]  Aya Kamaya,et al.  Informatics in radiology: RADTF: a semantic search-enabled, natural language processor-generated radiology teaching file. , 2010, Radiographics : a review publication of the Radiological Society of North America, Inc.

[7]  Loes M. M. Braun,et al.  Natural Language Processing in Radiology: A Systematic Review. , 2016, Radiology.

[8]  Mitchel B Harris,et al.  Clinical uncertainty at the intersection of advancing technology, evidence-based medicine, and health care policy. , 2014, JAMA surgery.

[9]  Keith J Dreyer,et al.  Natural language processing using online analytic processing for assessing recommendations in radiology reports. , 2008, Journal of the American College of Radiology : JACR.

[10]  X. Castells,et al.  Association between Radiologists' Experience and Accuracy in Interpreting Screening Mammograms , 2008, BMC health services research.

[11]  B J Hillman,et al.  Radiology reporting: attitudes of referring physicians. , 1988, Radiology.

[12]  H. Harvey,et al.  Imaging utilization from the ED: no difference between observation and admitted patients. , 2015, The American journal of emergency medicine.

[13]  A. Rosenkrantz,et al.  How "consistent" is "consistent"? A clinician-based assessment of the reliability of expressions used by radiologists to communicate diagnostic confidence. , 2014, Clinical radiology.

[14]  James H Thrall,et al.  Application of Recently Developed Computer Algorithm for Automatic Classification of Unstructured Radiology Reports: Validation Study 1 , 2004 .

[15]  Ramin Khorasani,et al.  Is terminology used effectively to convey diagnostic certainty in radiology reports? , 2003, Academic radiology.

[16]  Kai Zheng,et al.  Hedging their Mets: The Use of Uncertainty Terms in Clinical Documents and its Potential Implications when Sharing the Documents with Patients , 2012, AMIA.

[17]  E. Burnside,et al.  The ACR BI-RADS experience: learning from history. , 2009, Journal of the American College of Radiology : JACR.

[18]  J. Elmore,et al.  Reactions to Uncertainty and the Accuracy of Diagnostic Mammography , 2007, Journal of General Internal Medicine.

[19]  A. Bankier,et al.  Guidelines for Management of Incidental Pulmonary Nodules Detected on CT Images: From the Fleischner Society 2017. , 2017, Radiology.

[20]  William D Middleton,et al.  Reduction in Thyroid Nodule Biopsies and Improved Accuracy with American College of Radiology Thyroid Imaging Reporting and Data System. , 2018, Radiology.

[21]  W. Levinson Physician-patient communication. A key to malpractice prevention. , 1994, JAMA.

[22]  Philippe N. Tobler,et al.  Cognitive biases associated with medical decisions: a systematic review , 2016, BMC Medical Informatics and Decision Making.

[23]  C. Valls Pitfalls of the vague radiology report. , 2001, AJR. American journal of roentgenology.

[24]  P. Tobler,et al.  Decision-making in Multiple Sclerosis: The Role of Aversion to Ambiguity for Therapeutic Inertia among Neurologists (DIScUTIR MS) , 2017, Front. Neurol..

[25]  N. Dogra,et al.  Cultural diversity teaching and issues of uncertainty: the findings of a qualitative study , 2007, BMC medical education.

[26]  Leonard Berlin,et al.  Radiologic errors and malpractice: a blurry distinction. , 2007, AJR. American journal of roentgenology.

[27]  Ramin Khorasani,et al.  Practical examples of natural language processing in radiology. , 2011, Journal of the American College of Radiology : JACR.

[28]  Robert M. Marks,et al.  Evidence Supporting LI-RADS Major Features for CT- and MR Imaging-based Diagnosis of Hepatocellular Carcinoma: A Systematic Review. , 2018, Radiology.

[29]  William D Middleton,et al.  ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee. , 2018, Journal of the American College of Radiology : JACR.

[30]  K. Kahn,et al.  Information content and clarity of radiologists' reports for chest radiography. , 1996, Academic radiology.

[31]  Sayon Dutta,et al.  Automated detection using natural language processing of radiologists recommendations for additional imaging of incidental findings. , 2013, Annals of emergency medicine.

[32]  Hedvig Hricak,et al.  How Sure Are You, Doctor? A Standardized Lexicon to Describe the Radiologist's Level of Certainty. , 2016, AJR. American journal of roentgenology.

[33]  M. Platt,et al.  Risky business: the neuroeconomics of decision making under uncertainty , 2008, Nature Neuroscience.

[34]  M. Brooks,et al.  The malpractice liability of radiology reports: minimizing the risk. , 2015, Radiographics : a review publication of the Radiological Society of North America, Inc.

[35]  A. Wallis,et al.  The radiology report--are we getting the message across? , 2011, Clinical radiology.

[36]  L. Bassett,et al.  When Radiologists Perform Best: The Learning Curve in Screening Mammogram Interpretation , 2010 .

[37]  J. Hoang Do Not Hedge When There Is Certainty. , 2017, Journal of the American College of Radiology : JACR.

[38]  W. Chan,et al.  Revisiting the economic efficiencies of observation units. , 2015, Managed care.