Show, Describe and Conclude: On Exploiting the Structure Information of Chest X-ray Reports

Chest X-Ray (CXR) images are commonly used for clinical screening and diagnosis. Automatically writing reports for these images can considerably lighten the workload of radiologists for summarizing descriptive findings and conclusive impressions. The complex structures between and within sections of the reports pose a great challenge to the automatic report generation. Specifically, the section Impression is a diagnostic summarization over the section Findings; and the appearance of normality dominates each section over that of abnormality. Existing studies rarely explore and consider this fundamental structure information. In this work, we propose a novel framework that exploits the structure information between and within report sections for generating CXR imaging reports. First, we propose a two-stage strategy that explicitly models the relationship between Findings and Impression. Second, we design a novel cooperative multi-agent system that implicitly captures the imbalanced distribution between abnormality and normality. Experiments on two CXR report datasets show that our method achieves state-of-the-art performance in terms of various evaluation metrics. Our results expose that the proposed approach is able to generate high-quality medical reports through integrating the structure information.

[1]  Chuang Gan,et al.  Recurrent Topic-Transition GAN for Visual Paragraph Generation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Vaibhava Goel,et al.  Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[6]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7]  Ning Zhang,et al.  Deep Reinforcement Learning-Based Image Captioning with Embedding Reward , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ronald M. Summers,et al.  Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[10]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11]  Pengtao Xie,et al.  On the Automatic Generation of Medical Imaging Reports , 2017, ACL.

[12]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[13]  Ramakanth Pasunuru,et al.  Multi-Task Video Captioning with Video and Entailment Generation , 2017, ACL.

[14]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[15]  Jiebo Luo,et al.  Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[18]  Xin Wang,et al.  Video Captioning via Hierarchical Reinforcement Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Clement J. McDonald,et al.  Preparing a collection of radiology examinations for distribution and retrieval , 2015, J. Am. Medical Informatics Assoc..

[20]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[21]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[22]  Wei Xu,et al.  Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[23]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[26]  Jonathan Krause,et al.  A Hierarchical Approach for Generating Descriptive Image Paragraphs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[28]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[30]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[31]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[32]  Siqi Liu,et al.  Improved Image Captioning via Policy Gradient optimization of SPIDEr , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Eric P. Xing,et al.  Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation , 2018, NeurIPS.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Tamer Basar,et al.  Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.

[36]  Ronald M. Summers,et al.  TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Richard Socher,et al.  Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[39]  Wei Xu,et al.  Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).