Automated Detection of Radiology Reports that Require Follow-up Imaging Using Natural Language Processing Feature Engineering and Machine Learning Classification

While radiologists regularly issue follow-up recommendations, our preliminary research has shown that anywhere from 35 to 50% of patients who receive follow-up recommendations for findings of possible cancer on abdominopelvic imaging do not return for follow-up. As such, they remain at risk for adverse outcomes related to missed or delayed cancer diagnosis. In this study, we develop an algorithm to automatically detect free text radiology reports that have a follow-up recommendation using natural language processing (NLP) techniques and machine learning models. The data set used in this study consists of 6000 free text reports from the author’s institution. NLP techniques are used to engineer 1500 features, which include the most informative unigrams, bigrams, and trigrams in the training corpus after performing tokenization and Porter stemming. On this data set, we train naive Bayes, decision tree, and maximum entropy models. The decision tree model, with an F1 score of 0.458 and accuracy of 0.862, outperforms both the naive Bayes (F1 score of 0.381) and maximum entropy (F1 score of 0.387) models. The models were analyzed to determine predictive features, with term frequency of n -grams such as “renal neoplasm” and “evalu with enhanc” being most predictive of a follow-up recommendation. Key to maximizing performance was feature engineering that extracts predictive information and appropriate selection of machine learning algorithms based on the feature set.

[1]  Sayon Dutta,et al.  Automated detection using natural language processing of radiologists recommendations for additional imaging of incidental findings. , 2013, Annals of emergency medicine.

[2]  Cynthia Brandt,et al.  Semi-supervised clinical text classification with Laplacian SVMs: An application to cancer case management , 2013, J. Biomed. Informatics.

[3]  C. Langlotz,et al.  Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield. , 2017, AJR. American journal of roentgenology.

[4]  C. Langlotz,et al.  Code Abdomen: An Assessment Coding Scheme for Abdominal Imaging Findings Possibly Representing Cancer. , 2015, Journal of the American College of Radiology : JACR.

[5]  Curtis P Langlotz,et al.  Structured radiology reporting: are we there yet? , 2009, Radiology.

[6]  Saeed Hassanpour,et al.  Characterization of Change and Significance for Clinical Findings in Radiology Reports Through Natural Language Processing , 2017, Journal of Digital Imaging.

[7]  C. Langlotz,et al.  Deep Learning to Classify Radiology Free-Text Reports. , 2017, Radiology.

[8]  Po-Hao Chen,et al.  Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports , 2018, Journal of Digital Imaging.

[9]  Fei Xia,et al.  Automatic identification of critical follow-up recommendation sentences in radiology reports. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[12]  Loes M. M. Braun,et al.  Natural Language Processing in Radiology: A Systematic Review. , 2016, Radiology.

[13]  Thomas H. Payne,et al.  A text processing pipeline to extract recommendations from radiology reports , 2013, J. Biomed. Informatics.

[14]  Jun'ichi Tsujii,et al.  Named entity recognition of follow-up and time information in 20 000 radiology reports , 2012, J. Am. Medical Informatics Assoc..

[15]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Bruce I. Reiner,et al.  Quantitative Analysis of Uncertainty in Medical Reporting: Creating a Standardized and Objective Methodology , 2018, Journal of Digital Imaging.

[17]  P. Parizel,et al.  Structured reporting: if, why, when, how—and at what expense? Results of a focus group meeting of radiology professionals from eight countries , 2012, Insights into Imaging.

[18]  Darco Lalevic,et al.  Implementation of an Automated Radiology Recommendation-Tracking Engine for Abdominal Imaging Findings of Possible Cancer. , 2017, Journal of the American College of Radiology : JACR.