Automated radiology report generation using conditioned transformers

Abstract Radiology report writing in hospitals is a time-consuming task that also requires experience from the involved radiologists. This paper proposes a deep learning model to automatically generate radiology reports given a chest x-ray image from the public IU-Xray dataset. Our work consists of three stages: (1) Fine-tune a pre-trained Chexnet to predict specific tags from the image. (2) Calculate weighted semantic features from the predicted tag's pre-trained embeddings. (3) Condition a pre-trained GPT2 model on the visual and semantic features to generate the full medical reports. We analyze the generated reports using word-overlap metrics while also adding new meaningful semantic-based similarity metrics. The proposed model, which we call CDGPT2, surpassed most non-hierarchical recurrent models and transformer-based models in quantitative metrics while being considerably faster to train. Moreover, the model does not require a specific vocabulary and can be trained on different datasets without changing the architecture. Furthermore, we include a qualitative analysis from a radiologist from Egypt's national institute of cancer which showed that 61.6% of the generated reports on the test set were expertly written, and only 10% contained false information. We represent the first work to condition a pre-trained transformer on visual and semantic features to generate medical reports and to include semantic similarity metrics in the quantitative analysis of the generated reports.

[1]  Vasile Rus,et al.  An Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics , 2012, ITS.

[2]  Anup Pillai,et al.  Chest X-ray Report Generation through Fine-Grained Label Learning , 2020, MICCAI.

[3]  Andrew Y. Ng,et al.  CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning , 2017, ArXiv.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Jiebo Luo,et al.  Automatic Radiology Report Generation based on Multi-view Image Fusion and Medical Concept Enrichment , 2019, MICCAI.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[10]  Jiebo Luo,et al.  Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[12]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[13]  Lin Yang,et al.  MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Daguang Xu,et al.  When Radiology Report Generation Meets Knowledge Graph , 2020, AAAI.

[15]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[16]  Tsung-Hui Chang,et al.  Generating Radiology Reports via Memory-driven Transformer , 2020, EMNLP.

[17]  Jonathan Krause,et al.  A Hierarchical Approach for Generating Descriptive Image Paragraphs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Pengtao Xie,et al.  On the Automatic Generation of Medical Imaging Reports , 2017, ACL.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[22]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[23]  Ronald M. Summers,et al.  Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[25]  Chun-Nan Hsu,et al.  Learning Visual-Semantic Embeddings for Reporting Abnormal Findings on Chest X-rays , 2020, FINDINGS.

[26]  Hannes Schulz,et al.  Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation , 2017, ArXiv.

[27]  Clement J. McDonald,et al.  Preparing a collection of radiology examinations for distribution and retrieval , 2015, J. Am. Medical Informatics Assoc..

[28]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[29]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[30]  Pingkun Yan,et al.  Reinforced Transformer for Medical Image Captioning , 2019, MLMI@MICCAI.

[31]  Ion Androutsopoulos,et al.  Deep Relevance Ranking Using Enhanced Document-Query Interactions , 2018, EMNLP.

[32]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[33]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[34]  Vasiliki Kougia,et al.  A Survey on Biomedical Image Captioning , 2019, Proceedings of the Second Workshop on Shortcomings in Vision and Language.

[35]  Alexander M. Rush,et al.  Encoder-Agnostic Adaptation for Conditional Language Generation , 2019, ArXiv.

[36]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[39]  Ronald M. Summers,et al.  TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).