Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification

This paper explores cutting-edge deep learning methods for information extraction from medical imaging free text reports at a multi-institutional scale and compares them to the state-of-the-art domain-specific rule-based system - PEFinder and traditional machine learning methods - SVM and Adaboost. We proposed two distinct deep learning models - (i) CNN Word - Glove, and (ii) Domain phrase attention-based hierarchical recurrent neural network (DPA-HNN), for synthesizing information on pulmonary emboli (PE) from over 7370 clinical thoracic computed tomography (CT) free-text radiology reports collected from four major healthcare centers. Our proposed DPA-HNN model encodes domain-dependent phrases into an attention mechanism and represents a radiology report through a hierarchical RNN structure composed of word-level, sentence-level and document-level representations. Experimental results suggest that the performance of the deep learning models that are trained on a single institutional dataset, are better than rule-based PEFinder on our multi-institutional test sets. The best F1 score for the presence of PE in an adult patient population was 0.99 (DPA-HNN) and for a pediatrics population was 0.99 (HNN) which shows that the deep learning models being trained on adult data, demonstrated generalizability to pediatrics population with comparable accuracy. Our work suggests feasibility of broader usage of neural network models in automated classification of multi-institutional imaging text reports for a variety of applications including evaluation of imaging utilization, imaging yield, clinical decision support tools, and as part of automated classification of large corpus for medical imaging deep learning work.

[1]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Kent A. Spackman,et al.  SNOMED clinical terms: overview of the development process and project status , 2001, AMIA.

[4]  B. Gallego,et al.  Role of electronic health records in comparative effectiveness research. , 2013, Journal of comparative effectiveness research.

[5]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[6]  James H Thrall,et al.  Application of Recently Developed Computer Algorithm for Automatic Classification of Unstructured Radiology Reports: Validation Study 1 , 2004 .

[7]  Suneeta Agarwal,et al.  Automated Human Bone Age Assessment using Image Processing Methods - Survey , 2014 .

[8]  J. Frankovich,et al.  Evidence-based medicine in the EMR era. , 2011, The New England journal of medicine.

[9]  Nigam H. Shah,et al.  Practice-Based Evidence: Profiling the Safety of Cilostazol by Text-Mining of Clinical Notes , 2013, PloS one.

[10]  Thomas H. Payne,et al.  A text processing pipeline to extract recommendations from radiology reports , 2013, J. Biomed. Informatics.

[11]  Sunghwan Sohn,et al.  Identifying Abdominal Aortic Aneurysm Cases and Controls using Natural Language Processing of Radiology Reports , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[12]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[13]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[14]  Jun'ichi Tsujii,et al.  Named entity recognition of follow-up and time information in 20 000 radiology reports , 2012, J. Am. Medical Informatics Assoc..

[15]  Huan Huang,et al.  National trends in advanced outpatient diagnostic imaging utilization: an analysis of the medical expenditure panel survey, 2000-2009 , 2013, BMC Medical Imaging.

[16]  Daniel L. Rubin,et al.  Transfer learning on fused multiparametric MR images for classifying histopathological subtypes of rhabdomyosarcoma , 2017, Comput. Medical Imaging Graph..

[17]  Saeed Hassanpour,et al.  Predicting High Imaging Utilization Based on Initial Radiology Reports: A Feasibility Study of Machine Learning. , 2016, Academic radiology.

[18]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[19]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[20]  Wen-Huang Cheng,et al.  Computer-aided classification of lung nodules on computed tomography images via deep learning technique , 2015, OncoTargets and therapy.

[21]  M. Lungren,et al.  Physician self-referral: frequency of negative findings at MR imaging of the knee as a marker of appropriate utilization. , 2013, Radiology.

[22]  Daniel L. Rubin,et al.  Intelligent Word Embeddings of Free-Text Radiology Reports , 2017, AMIA.

[23]  Bonggun Shin,et al.  Classification of radiology reports using neural attention models , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[24]  Jimeng Sun,et al.  Using recurrent neural network models for early detection of heart failure onset , 2016, J. Am. Medical Informatics Assoc..

[25]  Hayit Greenspan,et al.  A comparative study for chest radiograph image retrieval using binary texture and deep learning classification , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[26]  Physician Self-Referral and Imaging Use Appropriateness: Negative Cervical Spine MRI Frequency as an Assessment Metric , 2014, American Journal of Neuroradiology.

[27]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[28]  Wendy W. Chapman,et al.  Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm , 2011, J. Biomed. Informatics.

[29]  Dimitrios Mitsouras,et al.  Natural Language Processing Technologies in Radiology Research and Clinical Applications. , 2016, Radiographics : a review publication of the Radiological Society of North America, Inc.

[30]  J. Austin,et al.  Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. , 2002, Radiology.

[31]  Daniel K. Powell,et al.  The use of ACR Appropriateness Criteria: a survey of radiology residents and program directors. , 2015, Clinical imaging.

[32]  Christoph Meinel,et al.  Deep Learning for Medical Image Analysis , 2018, Journal of Pathology Informatics.

[33]  Loes M. M. Braun,et al.  Natural Language Processing in Radiology: A Systematic Review. , 2016, Radiology.

[34]  C. Langlotz,et al.  Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield. , 2017, AJR. American journal of roentgenology.

[35]  Ramin Khorasani,et al.  Effect of computerized clinical decision support on the use and yield of CT pulmonary angiography in the emergency department. , 2012, Radiology.

[36]  Zhiyuan Liu,et al.  Neural Sentiment Classification with User and Product Attention , 2016, EMNLP.

[37]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[38]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[39]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[40]  M. Lungren,et al.  Imaging self-referral: here we go again. , 2013, AJR. American journal of roentgenology.

[41]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[42]  Huiman X Barnhart,et al.  Self-referral in medical imaging: a meta-analysis of the literature. , 2011, Journal of the American College of Radiology : JACR.

[43]  M. Lungren,et al.  Physician self-referral of lumbar spine MRI with comparative analysis of negative study rates as a marker of utilization appropriateness. , 2012, AJR. American journal of roentgenology.

[44]  Synho Do,et al.  Medical Image Deep Learning with Hospital PACS Dataset , 2015, ArXiv.

[45]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[46]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[47]  Oladimeji Farri,et al.  Automated clinical diagnosis: The role of content in various sections of a clinical document , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[48]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[49]  Sheng Yu,et al.  Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing , 2014, J. Biomed. Informatics.

[50]  Ronald M. Summers,et al.  Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning , 2016, IEEE Transactions on Medical Imaging.

[51]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[52]  Michael I. Jordan,et al.  Unsupervised Domain Adaptation with Residual Transfer Networks , 2016, NIPS.