VQA-Med: Overview of the Medical Visual Question Answering Task at ImageCLEF 2019

This paper presents an overview of the Medical Visual Question Answering task (VQA-Med) at ImageCLEF 2019. Participating systems were tasked with answering medical questions based on the visual content of radiology images. In this second edition of VQA-Med, we focused on four categories of clinical questions: Modality, Plane, Organ System, and Abnormality. These categories are designed with different degrees of difficulty leveraging both classification and text generation approaches. We also ensured that all questions can be answered from the image content without requiring additional medical knowledge or domain-specific inference. We created a new dataset of 4,200 radiology images and 15,292 question-answer pairs following these guidelines. The challenge was well received with 17 participating teams who applied a wide range of approaches such as transfer learning, multi-task learning, and ensemble methods. The best team achieved a BLEU score of 64.4% and an accuracy of 62.4%. In future editions, we will consider designing more goal-oriented datasets and tackling new aspects such as contextual information and domain-specific inference.

[1]  Fuji Ren,et al.  TUA1 at ImageCLEF 2019 VQA-Med: a Classification and Generation Model based on Transfer Learning , 2019, CLEF.

[2]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Mahmoud Al-Ayyoub,et al.  JUST at ImageCLEF 2019 Visual Question Answering in the Medical Domain , 2019, CLEF.

[5]  Krishnamoorthi Makkithaya,et al.  MIT Manipal at ImageCLEF 2019 Visual Question Answering in Medical Domain , 2019, CLEF.

[6]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Mohamed Ben Ahmed,et al.  An Encoder-Decoder Model for Visual Question Answering in the Medical Domain , 2019, CLEF.

[8]  Tomasz Kornuta,et al.  Leveraging Medical Visual Question Answering with Supporting Facts , 2019, CLEF.

[9]  Trevor Darrell,et al.  Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.

[10]  Carlos R. del-Blanco,et al.  ImageCLEF 2019: Multimedia Retrieval in Medicine, Lifelogging, Security and Nature , 2019, CLEF.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yash Goyal,et al.  Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Mohammed El Amine Abderrahim,et al.  Tlemcen University at ImageCLEF 2019 Visual Question Answering Task , 2019, CLEF.

[14]  Dina Demner-Fushman,et al.  A dataset of clinically generated visual questions and answers about radiology images. , 2018 .

[15]  Raphael Sznitman,et al.  Ensemble of Streamlined Bilinear Visual Question Answering Models for the ImageCLEF 2019 Challenge in the Medical Domain , 2019, CLEF.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Lin Li,et al.  Zhejiang University at ImageCLEF 2019 Visual Question Answering in the Medical Domain , 2019, CLEF.

[18]  Assaf Spanier,et al.  LSTM in VQA-Med, is It Really Needed? JCE Study on the ImageCLEF 2019 Dataset , 2019, CLEF.

[19]  Henning Müller,et al.  Overview of ImageCLEF 2018 Medical Domain Visual Question Answering Task , 2018, CLEF.

[20]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21]  Lei Shi,et al.  Deep Multimodal Learning for Medical Visual Question Answering , 2019, CLEF.

[22]  Mohit Bansal,et al.  Medical Visual Question Answering at Image CLEF 2019- VQA Med , 2019, CLEF.

[23]  Asma Ben Abacha,et al.  NLM at ImageCLEF 2018 Visual Question Answering in the Medical Domain , 2018, CLEF.