kdevqa at VQA-Med 2020: Focusing on GLU-based Classification

Interpretation of medical images is a challenging research problem with increasing interest in medical applications of artificial intelligence. In particular, the ImageCLEF2020 visual question answering (VQA) task is expected to have applications such as a second opinion. The purpose of this research is to find an effective VQA-Med system method. We propose neural networks using the Gated Linear Unit for effective fusion of image and question features. Before training, we perform pre-processes and conduct pre-training. We apply so called “inpainting” to remove a logo or text embedded in images so that we attempt to extract image features with less noise. And we use the VQA-Med2019 dataset to train some of the weights of the proposed model. We consider the VQA task as a 332-dimensional classification task. The score of our proposed model turns out to be 0.314 in Accuracy and 0.350 in Bleu in VQA-Med2020 task.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[3]  Alexandru Telea,et al.  An Image Inpainting Technique Based on the Fast Marching Method , 2004, J. Graphics, GPU, & Game Tools.

[4]  Antonio,et al.  ImageCLEF 2020: Multimedia Retrieval in Lifelogging, Medical, Nature, and Internet Applications , 2020, ECIR.

[5]  Yash Goyal,et al.  Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[9]  Henning Müller,et al.  VQA-Med: Overview of the Medical Visual Question Answering Task at ImageCLEF 2019 , 2019, CLEF.

[10]  Abien Fred Agarap Deep Learning using Rectified Linear Units (ReLU) , 2018, ArXiv.

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Henning Müller,et al.  Overview of the VQA-Med Task at ImageCLEF 2021: Visual Question Answering and Generation in the Medical Domain , 2020, CLEF.

[14]  Lin Li,et al.  Zhejiang University at ImageCLEF 2019 Visual Question Answering in the Medical Domain , 2019, CLEF.

[15]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.