Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation

Automatically generating radiology reports can improve current clinical practice in diagnostic radiology. On one hand, it can relieve radiologists from the heavy burden of report writing; On the other hand, it can remind radiologists of abnormalities and avoid the misdiagnosis and missed diagnosis. Yet, this task remains a challenging job for data-driven neural networks, due to the serious visual and textual data biases. To this end, we propose a Posterior-and-Prior Knowledge Exploring-and-Distilling approach (PPKED) to imitate the working patterns of radiologists, who will first examine the abnormal regions and assign the disease topic tags to the abnormal regions, and then rely on the years of prior medical knowledge and prior working experience accumulations to write reports. Thus, the PPKED includes three modules: Posterior Knowledge Explorer (PoKE), Prior Knowledge Explorer (PrKE) and Multi-domain Knowledge Distiller (MKD). In detail, PoKE explores the posterior knowledge, which provides explicit abnormal visual regions to alleviate visual data bias; PrKE explores the prior knowledge from the prior medical knowledge graph (medical knowledge) and prior radiology reports (working experience) to alleviate textual data bias. The explored knowledge is distilled by the MKD to generate the final reports. Evaluated on MIMIC-CXR and IU-Xray datasets, our method is able to outperform previous state-of-the-art models on these two datasets.

[1]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[3]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[4]  Xian Wu,et al.  Prophet Attention: Predicting Attention with Future Attention , 2020, NeurIPS.

[5]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Eric P. Xing,et al.  Show, Describe and Conclude: On Exploiting the Structure Information of Chest X-ray Reports , 2019, ACL.

[10]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[11]  L. Delrue,et al.  Difficulties in the Interpretation of Chest Radiography , 2011 .

[12]  Clement J. McDonald,et al.  Preparing a collection of radiology examinations for distribution and retrieval , 2015, J. Am. Medical Informatics Assoc..

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Ronan McDermott,et al.  Discrepancy and Error in Radiology: Concepts, Causes and Consequences , 2012, The Ulster medical journal.

[15]  Wei Xu,et al.  Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Tianhao Zhang,et al.  Exploring Semantic Relationships for Image Captioning without Parallel Data , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[17]  Tsung-Hui Chang,et al.  Generating Radiology Reports via Memory-driven Transformer , 2020, EMNLP.

[18]  Jonathan Krause,et al.  A Hierarchical Approach for Generating Descriptive Image Paragraphs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Richard Socher,et al.  Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Xinlei Chen,et al.  Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and VQA , 2017, ArXiv.

[23]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24]  Tao Xu,et al.  Multimodal Recurrent Model with Attention for Automated Radiology Report Generation , 2018, MICCAI.

[25]  Pengtao Xie,et al.  On the Automatic Generation of Medical Imaging Reports , 2017, ACL.

[26]  Shen Ge,et al.  Contrastive Attention for Automatic Chest X-ray Report Generation , 2021, FINDINGS.

[27]  Ronald M. Summers,et al.  TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Eric P. Xing,et al.  Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation , 2018, NeurIPS.

[29]  M. Appleyard,et al.  Evidence‐based guideline for the written radiology report: Methods, recommendations and implementation challenges , 2013, Journal of medical imaging and radiation oncology.

[30]  Vaibhava Goel,et al.  Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Maozhen Li,et al.  Multi-Attention and Incorporating Background Information Model for Chest X-Ray Image Report Generation , 2019, IEEE Access.

[32]  Fenglin Liu,et al.  Exploring and Distilling Cross-Modal Information for Image Captioning , 2019, IJCAI.

[33]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[34]  Xu Sun,et al.  simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions , 2018, EMNLP.

[35]  Eric P. Xing,et al.  Knowledge-driven Encode, Retrieve, Paraphrase for Medical Image Report Generation , 2019, AAAI.

[36]  Xiaojun Chang,et al.  Auxiliary signal-guided knowledge encoder-decoder for medical report generation , 2020, World Wide Web.

[37]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jiebo Luo,et al.  Automatic Radiology Report Generation based on Multi-view Image Fusion and Medical Concept Enrichment , 2019, MICCAI.

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  Chuang Gan,et al.  Recurrent Topic-Transition GAN for Visual Paragraph Generation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Fenglin Liu,et al.  Non-Autoregressive Video Captioning with Iterative Refinement , 2019, ArXiv.

[42]  Xu Sun,et al.  Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations , 2019, NeurIPS.

[43]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[44]  Ronald M. Summers,et al.  Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Daguang Xu,et al.  When Radiology Report Generation Meets Knowledge Graph , 2020, AAAI.

[46]  Roger G. Mark,et al.  MIMIC-CXR: A large publicly available database of labeled chest radiographs , 2019, ArXiv.