MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training

In this paper, we consider the problem of enhancing self-supervised visual-language pre-training (VLP) with medical-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel report filter to extract the medical entities, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel entity embedding module by querying an external knowledge description base, to exploit the rich context of additional information that the medical domain affords, and implicitly build relationships between entities in the language embedding space; Third, we propose a novel Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level only with self-supervised learning, thus enabling the ability for spatial grounding; Fourth, we conduct thorough experiments to validate the effectiveness of our proposed architecture, and benchmark on numerous public benchmarks e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. In both zero-shot and fine-tuning settings, our model has demonstrated strong performance compared with the former methods on disease classification and grounding.

[1]  P. Chambon,et al.  Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards , 2022, EMNLP.

[2]  Jimeng Sun,et al.  MedCLIP: Contrastive Learning from Unpaired Medical Images and Text , 2022, EMNLP.

[3]  P. Rajpurkar,et al.  Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning , 2022, Nature Biomedical Engineering.

[4]  Guanbin Li,et al.  Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge , 2022, ACM Multimedia.

[5]  Luping Zhou,et al.  A Medical Semantic-Assisted Transformer for Radiographic Report Generation , 2022, MICCAI.

[6]  K. Batmanghelich,et al.  Anatomy-Guided Weakly-Supervised Abnormality Localization in Chest X-rays , 2022, MICCAI.

[7]  Stephanie L. Hyland,et al.  Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing , 2022, ECCV.

[8]  S. Hoi,et al.  BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , 2022, ICML.

[9]  Xian Wu,et al.  Knowledge matters: Chest radiology report generation with general and specific knowledge , 2021, Medical Image Anal..

[10]  G. Kaissis,et al.  Joint Learning of Localized Representations from Medical Images and Reports , 2021, ECCV.

[11]  Audrey G. Chung,et al.  COVID-Net CXR-2: An Enhanced Deep Convolutional Neural Network Design for Detection of COVID-19 Cases From Chest X-ray Images , 2021, Frontiers in Medicine.

[12]  Christopher D. Manning,et al.  Contrastive Learning of Medical Visual Representations from Paired Images and Text , 2020, MLHC.

[13]  S. Yeung,et al.  GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Silvia Terragni,et al.  Contrastive Language-Image Pre-training for the Italian Language , 2021, CLiC-it.

[15]  Zhou Yu,et al.  ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration , 2021, ACM Multimedia.

[16]  Satyananda Kashyap,et al.  Chest ImaGenome Dataset for Clinical Reasoning , 2021, NeurIPS Datasets and Benchmarks.

[17]  Junnan Li,et al.  Align before Fuse: Vision and Language Representation Learning with Momentum Distillation , 2021, NeurIPS.

[18]  Matthew P. Lungren,et al.  RadGraph: Extracting Clinical Entities and Relations from Radiology Reports , 2021, NeurIPS Datasets and Benchmarks.

[19]  Yuexian Zou,et al.  Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[21]  Quoc V. Le,et al.  Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.

[22]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[23]  Yuhao Zhang,et al.  Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation , 2020, NAACL.

[24]  Hao Tian,et al.  ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph , 2020, AAAI.

[25]  Shaojie Tang,et al.  A survey on incorporating domain knowledge into deep learning for medical image analysis , 2020, Medical Image Anal..

[26]  Hironobu Fujiyoshi,et al.  Embedding Human Knowledge in Deep Neural Network via Attention Map , 2019, VISIGRAPP.

[27]  William C. Bennett,et al.  Chest imaging representing a COVID-19 positive rural U.S. population , 2020, Scientific Data.

[28]  Tsung-Hui Chang,et al.  Generating Radiology Reports via Memory-driven Transformer , 2020, EMNLP.

[29]  N. Sun,et al.  Deep learning segmentation model for automated detection of the opacity regions in the chest X-rays of the Covid-19 positive patients and the application for disease severity , 2020, medRxiv.

[30]  Hui Cui,et al.  Collaborative Learning of Cross-channel Clinical Attention for Radiotherapy-Related Esophageal Fistula Prediction from CT , 2020, MICCAI.

[31]  Jacob Andreas,et al.  Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary Edema Assessment , 2020, MICCAI.

[32]  Department of Computer Science,et al.  CheXpert++: Approximating the CheXpert labeler for Speed, Differentiability, and Probabilistic Output , 2020, MLHC.

[33]  Andrew Y. Ng,et al.  CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT , 2020, EMNLP.

[34]  Jianfeng Gao,et al.  Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.

[35]  Xin Huang,et al.  Dual-Ray Net: Automatic Diagnosis of Thoracic Diseases Using Frontal and Lateral Chest X-rays , 2020, J. Medical Imaging Health Informatics.

[36]  Pheng-Ann Heng,et al.  CANet: Cross-Disease Attention Network for Joint Diabetic Retinopathy and Diabetic Macular Edema Grading , 2019, IEEE Transactions on Medical Imaging.

[37]  Yu Cheng,et al.  UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.

[38]  Peter Caccetta,et al.  ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[39]  Daniel L. Rubin,et al.  Cross-Modal Data Programming Enables Rapid Medical Machine Learning , 2019, Patterns.

[40]  Xiaohong Zhang,et al.  Learning to Recognize Thoracic Disease in Chest X-Rays With Knowledge-Guided Deep Zoom Neural Networks , 2020, IEEE Access.

[41]  Qianqian Du,et al.  DScGANS: Integrate Domain Knowledge in Training Dual-Path Semi-supervised Conditional Generative Adversarial Networks and S3VM for Ultrasonography Thyroid Nodules Classification , 2019, MICCAI.

[42]  Cho-Jui Hsieh,et al.  VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.

[43]  Zoe L. Jiang,et al.  Multi-task deep convolutional neural network for cancer diagnosis , 2019, Neurocomputing.

[44]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[45]  Zhengrong Liang,et al.  Expert knowledge-infused deep learning for automatic lung nodule detection. , 2019, Journal of X-ray science and technology.

[46]  Weidong Cai,et al.  Knowledge-based Collaborative Deep Learning for Benign-Malignant Lung Nodule Classification on Chest CT , 2019, IEEE Transactions on Medical Imaging.

[47]  Xiaofei Wang,et al.  Attention Based Glaucoma Detection: A Large-Scale Database and CNN Model , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Iván González-Díaz,et al.  DermaKNet: Incorporating the Knowledge of Dermatologists to Convolutional Neural Networks for Skin Lesion Diagnosis , 2019, IEEE Journal of Biomedical and Health Informatics.

[49]  Chong Wang,et al.  Attention to Lesion: Lesion-Aware Convolutional Neural Network for Retinal Optical Coherence Tomography Image Classification , 2019, IEEE Transactions on Medical Imaging.

[50]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[51]  Carol C Wu,et al.  Augmenting the National Institutes of Health Chest Radiograph Dataset with Expert Annotations of Possible Pneumonia. , 2019, Radiology. Artificial intelligence.

[52]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[53]  Marcus A. Badgeley,et al.  Automated deep-neural-network surveillance of cranial images for acute neurologic events , 2018, Nature Medicine.

[54]  Gustavo Carneiro,et al.  Training Medical Image Analysis Systems like Radiologists , 2018, MICCAI.

[55]  Ronald M. Summers,et al.  NegBio: a high-performance tool for negation and uncertainty detection in radiology reports , 2017, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[56]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[57]  Ulas Bagci,et al.  Risk Stratification of Lung Nodules Using 3D CNN-Based Multi-task Learning , 2017, IPMI.

[58]  Bai Ying Lei,et al.  Automatic Scoring of Multiple Semantic Attributes With Multi-Task Feature Leverage: A Study on Pulmonary Nodules in CT Images , 2017, IEEE Transactions on Medical Imaging.

[59]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[60]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Dinggang Shen,et al.  Machine Learning in Medical Imaging , 2012, Lecture Notes in Computer Science.

[62]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[63]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[64]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .