Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

Automatically generating a report from a patient’s Chest X-Rays (CXRs) is a promising solution to reducing clinical workload and improving patient care. However, current CXR report generators—which are predominantly encoder-to-decoder models—lack the diagnostic accuracy to be deployed in a clinical setting. To improve CXR report generation, we investigate warm-starting the encoder and decoder with recent open-source computer vision and natural language processing checkpoints, such as the Vision Transformer (ViT) and PubMedBERT. To this end, each checkpoint is evaluated on the MIMIC-CXR and IU X-Ray datasets using natural language generation and Clinical Efficacy (CE) metrics. Our experimental investigation demonstrates that the Convolutional vision Transformer (CvT) ImageNet-21K and the Distilled Generative Pre-trained Transformer 2 (DistilGPT2) checkpoints are best for warm-starting the encoder and decoder, respectively. Compared to the state-ofthe-art (M Transformer Progressive), CvT2DistilGPT2 attained an improvement of 8.3% for CE F-1, 1.8% for BLEU-4, 1.6% for ROUGE-L, and 1.0% for METEOR. The reports generated by CvT2DistilGPT2 are more diagnostically accurate and have a higher similarity to radiologist reports than previous approaches. By leveraging warm-starting, CvT2DistilGPT2 brings automatic CXR report generation one step closer to the clinical setting. CvT2DistilGPT2 and its MIMIC-CXR checkpoint are available at https://github.com/aehrc/cvt2distilgpt2.

[1]  Matthijs Douze,et al.  XCiT: Cross-Covariance Image Transformers , 2021, NeurIPS.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  E. Merkle,et al.  Quantifying Radiology Resident Fatigue: Analysis of Preliminary Reports. , 2021, Radiology.

[4]  Shen Ge,et al.  Contrastive Attention for Automatic Chest X-ray Report Generation , 2021, FINDINGS.

[5]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  R. Scheffler,et al.  Global Health Workforce Labor Market Projections for 2030 , 2016, Human Resources for Health.

[8]  Eric P. Xing,et al.  Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation , 2018, NeurIPS.

[9]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[10]  Steven Horng,et al.  MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports , 2019, Scientific Data.

[11]  James H Thrall,et al.  Artificial Intelligence and Machine Learning in Radiology: Opportunities, Challenges, Pitfalls, and Criteria for Success. , 2018, Journal of the American College of Radiology : JACR.

[12]  Jindrich Libovický,et al.  Input Combination Strategies for Multi-Source Transformer Decoder , 2018, WMT.

[13]  Rita Cucchiara,et al.  Meshed-Memory Transformer for Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  M. Recht,et al.  Burnout of Radiologists: Frequency, Risk Factors, and Remedies: A Report of the ACR Commission on Human Resources. , 2016, Journal of the American College of Radiology : JACR.

[15]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[16]  Navdeep Kaur,et al.  Methods for automatic generation of radiological reports of chest radiographs: a comprehensive survey , 2021, Multimedia Tools and Applications.

[17]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[18]  Vaibhava Goel,et al.  Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Clement J. McDonald,et al.  Preparing a collection of radiology examinations for distribution and retrieval , 2015, J. Am. Medical Informatics Assoc..

[20]  Suzanne Fricke,et al.  Semantic Scholar , 2018, Journal of the Medical Library Association : JMLA.

[21]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[22]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[23]  Furu Wei,et al.  BEiT: BERT Pre-Training of Image Transformers , 2021, ArXiv.

[24]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Andrew Y. Ng,et al.  CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning , 2017, ArXiv.

[26]  European Society of Radiology Good practice for radiological reporting. Guidelines from the European Society of Radiology (ESR) , 2011, Insights into imaging.

[27]  Bradley J Erickson,et al.  The effects of changes in utilization and technological advancements of cross-sectional imaging on radiologist workload. , 2015, Academic radiology.

[28]  I. Satia,et al.  Assessing the accuracy and certainty in interpreting chest X-rays in the medical division. , 2013, Clinical medicine.

[29]  Barry Kelly,et al.  The Chest Radiograph , 2012, The Ulster medical journal.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  Zhiyong Lu,et al.  Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets , 2019, BioNLP@ACL.

[32]  Aly Fahmy,et al.  Automated radiology report generation using conditioned transformers , 2021 .

[33]  Vasiliki Kougia,et al.  Diagnostic captioning: a survey , 2021, Knowledge and Information Systems.

[34]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[35]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[37]  Roger G. Mark,et al.  MIMIC-CXR: A large publicly available database of labeled chest radiographs , 2019, ArXiv.

[38]  Wulfram Gerstner,et al.  Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances , 2021, ICML.

[39]  Kevin M. Schartz,et al.  Long radiology workdays reduce detection and accommodation accuracy. , 2010, Journal of the American College of Radiology : JACR.

[40]  Alan Alexander,et al.  An Intelligent Future for Medical Imaging: A Market Outlook on Artificial Intelligence for Medical Imaging. , 2020, Journal of the American College of Radiology : JACR.

[41]  Lucas Beyer,et al.  Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.

[42]  Trevor Darrell,et al.  Early Convolutions Help Transformers See Better , 2021, NeurIPS.

[43]  A. Ng,et al.  CheXtransfer: performance and parameter efficiency of ImageNet models for chest X-Ray interpretation , 2021, CHIL.

[44]  Fahad Shahbaz Khan,et al.  Intriguing Properties of Vision Transformers , 2021, NeurIPS.

[45]  Mohammad S. Sorower A Literature Survey on Algorithms for Multi-label Learning , 2010 .

[46]  Eric P. Xing,et al.  Show, Describe and Conclude: On Exploiting the Structure Information of Chest X-ray Reports , 2019, ACL.

[47]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[48]  Mustafa Suleyman,et al.  Key challenges for delivering clinical impact with artificial intelligence , 2019, BMC Medicine.

[49]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[50]  Xiaodong Liu,et al.  Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..

[51]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[53]  Ronald M. Summers,et al.  TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[55]  Daguang Xu,et al.  When Radiology Report Generation Meets Knowledge Graph , 2020, AAAI.

[56]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[57]  Tsung-Hui Chang,et al.  Generating Radiology Reports via Memory-driven Transformer , 2020, EMNLP.

[58]  Yuexian Zou,et al.  Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Bobak Mortazavi,et al.  Learning to Generate Clinically Coherent Chest X-Ray Reports , 2020, FINDINGS.

[60]  Dana Siegal,et al.  The role of radiology in diagnostic error: a medical malpractice claims review , 2017, Diagnosis.

[61]  N. Codella,et al.  CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[62]  R. Atun,et al.  Variability in interpretation of chest radiographs among Russian clinicians and implications for screening programmes: observational study , 2005, BMJ : British Medical Journal.

[63]  Yan Song,et al.  Cross-modal Memory Networks for Radiology Report Generation , 2022, ACL.

[64]  Andrew Y. Ng,et al.  CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT , 2020, EMNLP.

[65]  How does artificial intelligence in radiology improve efficiency and health outcomes? , 2021, Pediatric Radiology.

[66]  Shashi Narayan,et al.  Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[67]  Peter Szolovits,et al.  Clinically Accurate Chest X-Ray Report Generation , 2019, MLHC.

[68]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[69]  Michael Krauthammer,et al.  Progressive Transformer-Based Generation of Radiology Reports , 2021, EMNLP.

[70]  J. Kanne,et al.  Common Errors and Pitfalls in Interpretation of the Adult Chest Radiograph , 2005 .

[71]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[72]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[73]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[74]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[75]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[76]  Pengtao Xie,et al.  On the Automatic Generation of Medical Imaging Reports , 2017, ACL.