Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images

While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge. To address this challenge, we propose a novel approach called Knowledge-enhanced Auto Diagnosis (KAD) which leverages existing medical domain knowledge to guide vision-language pre-training using paired chest X-rays and radiology reports. We evaluate KAD on {four} external X-ray datasets and demonstrate that its zero-shot performance is not only comparable to that of fully-supervised models, but also superior to the average of three expert radiologists for three (out of five) pathologies with statistical significance. Moreover, when few-shot annotation is available, KAD outperforms all existing approaches in fine-tuning settings, demonstrating its potential for application in different clinical scenarios.

[1]  Weidi Xie,et al.  K-Diag: Knowledge-enhanced Disease Diagnosis in Radiographic Imaging , 2023, ArXiv.

[2]  Weidi Xie,et al.  MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training , 2023, medRxiv.

[3]  Weidi Xie,et al.  Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models , 2022, BMVC.

[4]  P. Rajpurkar,et al.  Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning , 2022, Nature Biomedical Engineering.

[5]  Guanbin Li,et al.  Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge , 2022, ACM Multimedia.

[6]  P. Rajpurkar,et al.  Self-supervised learning in medicine and healthcare , 2022, Nature Biomedical Engineering.

[7]  Ali S. Tejani,et al.  On the Opportunities and Risks of Foundation Models for Natural Language Processing in Radiology. , 2022, Radiology. Artificial intelligence.

[8]  Oriol Vinyals,et al.  Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.

[9]  Stephanie L. Hyland,et al.  Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing , 2022, ECCV.

[10]  G. Kaissis,et al.  Joint Learning of Localized Representations from Medical Images and Reports , 2021, ECCV.

[11]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Kurt Keutzer,et al.  How Much Can CLIP Benefit Vision-and-Language Tasks? , 2021, ICLR.

[13]  Christopher D. Manning,et al.  Contrastive Learning of Medical Visual Representations from Paired Images and Text , 2020, MLHC.

[14]  Jianfeng Gao,et al.  Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..

[15]  Y. Zhang,et al.  Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports , 2021, medRxiv.

[16]  S. Yeung,et al.  GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Silvia Terragni,et al.  Contrastive Language-Image Pre-training for the Italian Language , 2021, CLiC-it.

[18]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[19]  Satyananda Kashyap,et al.  Chest ImaGenome Dataset for Clinical Reasoning , 2021, NeurIPS Datasets and Benchmarks.

[20]  Junnan Li,et al.  Align before Fuse: Vision and Language Representation Learning with Momentum Distillation , 2021, NeurIPS.

[21]  Matthew P. Lungren,et al.  RadGraph: Extracting Clinical Entities and Relations from Radiology Reports , 2021, NeurIPS Datasets and Benchmarks.

[22]  Emily M. Bender,et al.  On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.

[23]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[24]  Andrew Y. Ng,et al.  CheXternal: generalization of deep learning models for chest X-ray interpretation to photos of chest X-rays and external clinical settings , 2021, CHIL.

[25]  Quoc V. Le,et al.  Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.

[26]  Robert Dale,et al.  GPT-3: What’s it good for? , 2020, Natural Language Engineering.

[27]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[28]  Shaojie Tang,et al.  A survey on incorporating domain knowledge into deep learning for medical image analysis , 2020, Medical Image Anal..

[29]  Zongwei Zhou,et al.  Models Genesis. , 2020, Medical image analysis.

[30]  Hironobu Fujiyoshi,et al.  Embedding Human Knowledge in Deep Neural Network via Attention Map , 2019, VISIGRAPP.

[31]  Hui Cui,et al.  Collaborative Learning of Cross-channel Clinical Attention for Radiotherapy-Related Esophageal Fistula Prediction from CT , 2020, MICCAI.

[32]  Jacob Andreas,et al.  Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary Edema Assessment , 2020, MICCAI.

[33]  Shuang Yu,et al.  Comparing to Learn: Surpassing ImageNet Pretraining on Radiographs By Comparing Image Representations , 2020, MICCAI.

[34]  Department of Computer Science,et al.  CheXpert++: Approximating the CheXpert labeler for Speed, Differentiability, and Probabilistic Output , 2020, MLHC.

[35]  Yizhou Yu,et al.  ChestX-Det10: Chest X-ray Dataset on Detection of Thoracic Abnormalities , 2020, ArXiv.

[36]  C. Lippert,et al.  3D Self-Supervised Methods for Medical Imaging , 2020, NeurIPS.

[37]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[38]  Andrew Y. Ng,et al.  CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT , 2020, EMNLP.

[39]  Xin Huang,et al.  Dual-Ray Net: Automatic Diagnosis of Thoracic Diseases Using Frontal and Lateral Chest X-rays , 2020, J. Medical Imaging Health Informatics.

[40]  Pheng-Ann Heng,et al.  CANet: Cross-Disease Attention Network for Joint Diabetic Retinopathy and Diabetic Macular Edema Grading , 2019, IEEE Transactions on Medical Imaging.

[41]  Yu Cheng,et al.  UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.

[42]  Antonio Pertusa,et al.  PadChest: A large chest x-ray image dataset with multi-label annotated reports , 2019, Medical Image Anal..

[43]  Xiaohong Zhang,et al.  Learning to Recognize Thoracic Disease in Chest X-Rays With Knowledge-Guided Deep Zoom Neural Networks , 2020, IEEE Access.

[44]  Steven Horng,et al.  MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports , 2019, Scientific Data.

[45]  Liang Chen,et al.  Self-supervised learning for medical image analysis using image context restoration , 2019, Medical Image Anal..

[46]  Qianqian Du,et al.  DScGANS: Integrate Domain Knowledge in Training Dual-Path Semi-supervised Conditional Generative Adversarial Networks and S3VM for Ultrasonography Thyroid Nodules Classification , 2019, MICCAI.

[47]  Yujiu Yang,et al.  Self-supervised Feature Learning for 3D Medical Images by Playing a Rubik's Cube , 2019, MICCAI.

[48]  Hannaneh Hajishirzi,et al.  Entity, Relation, and Event Extraction with Contextualized Span Representations , 2019, EMNLP.

[49]  Cho-Jui Hsieh,et al.  VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.

[50]  Zoe L. Jiang,et al.  Multi-task deep convolutional neural network for cancer diagnosis , 2019, Neurocomputing.

[51]  Zhengrong Liang,et al.  Expert knowledge-infused deep learning for automatic lung nodule detection. , 2019, Journal of X-ray science and technology.

[52]  Weidong Cai,et al.  Knowledge-based Collaborative Deep Learning for Benign-Malignant Lung Nodule Classification on Chest CT , 2019, IEEE Transactions on Medical Imaging.

[53]  Xiaofei Wang,et al.  Attention Based Glaucoma Detection: A Large-Scale Database and CNN Model , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Iván González-Díaz,et al.  DermaKNet: Incorporating the Knowledge of Dermatologists to Convolutional Neural Networks for Skin Lesion Diagnosis , 2019, IEEE Journal of Biomedical and Health Informatics.

[55]  Daniel King,et al.  ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing , 2019, BioNLP@ACL.

[56]  Chong Wang,et al.  Attention to Lesion: Lesion-Aware Convolutional Neural Network for Retinal Optical Coherence Tomography Image Classification , 2019, IEEE Transactions on Medical Imaging.

[57]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[58]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[59]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[60]  Gustavo Carneiro,et al.  Training Medical Image Analysis Systems like Radiologists , 2018, MICCAI.

[61]  Ronald M. Summers,et al.  NegBio: a high-performance tool for negation and uncertainty detection in radiology reports , 2017, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[62]  Andrew Y. Ng,et al.  CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning , 2017, ArXiv.

[63]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[64]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[65]  Ulas Bagci,et al.  Risk Stratification of Lung Nodules Using 3D CNN-Based Multi-task Learning , 2017, IPMI.

[66]  Bai Ying Lei,et al.  Automatic Scoring of Multiple Semantic Attributes With Multi-Task Feature Leverage: A Study on Pulmonary Nodules in CT Images , 2017, IEEE Transactions on Medical Imaging.

[67]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[68]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[70]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[71]  Kevin Donnelly,et al.  SNOMED-CT: The advanced terminology and coding system for eHealth. , 2006, Studies in health technology and informatics.

[72]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..