Slake: A Semantically-Labeled Knowledge-Enhanced Dataset For Medical Visual Question Answering

Medical visual question answering (Med-VQA) has tremendous potential in healthcare. However, the development of this technology is hindered by the lacking of publicly-available and high-quality labeled datasets for training and evaluation. In this paper, we present a large bilingual dataset, SLAKE, with comprehensive semantic labels annotated by experienced physicians and a new structural medical knowledge base for Med-VQA. Besides, SLAKE includes richer modalities and covers more human body parts than the currently available dataset. We show that SLAKE can be used to facilitate the development and evaluation of Med-VQA systems. The dataset can be downloaded from http://www.med-vqa.com/slake.

[1]  Ronald M. Summers,et al.  A large annotated medical image dataset for the development and evaluation of segmentation algorithms , 2019, ArXiv.

[2]  Yash Goyal,et al.  Yin and Yang: Balancing and Answering Binary Visual Questions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Asma Ben Abacha,et al.  Descriptor : A dataset of clinically generated visual questions and answers about radiology images , 2018 .

[4]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[5]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[8]  Qi Wu,et al.  FVQA: Fact-Based Visual Question Answering , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Dhruv Batra,et al.  Analyzing the Behavior of Visual Question Answering Models , 2016, EMNLP.

[10]  Guido Gerig,et al.  User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability , 2006, NeuroImage.

[11]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Bo Liu,et al.  Medical Visual Question Answering via Conditional Reasoning , 2020, ACM Multimedia.

[13]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Thanh-Toan Do,et al.  Overcoming Data Limitation in Medical Visual Question Answering , 2019, MICCAI.