论文信息 - kdevqa at VQA-Med 2020: Focusing on GLU-based Classification

kdevqa at VQA-Med 2020: Focusing on GLU-based Classification

Interpretation of medical images is a challenging research problem with increasing interest in medical applications of artificial intelligence. In particular, the ImageCLEF2020 visual question answering (VQA) task is expected to have applications such as a second opinion. The purpose of this research is to find an effective VQA-Med system method. We propose neural networks using the Gated Linear Unit for effective fusion of image and question features. Before training, we perform pre-processes and conduct pre-training. We apply so called “inpainting” to remove a logo or text embedded in images so that we attempt to extract image features with less noise. And we use the VQA-Med2019 dataset to train some of the weights of the proposed model. We consider the VQA task as a 332-dimensional classification task. The score of our proposed model turns out to be 0.314 in Accuracy and 0.350 in Bleu in VQA-Med2020 task.

Masaki Aono | Hideo Umada

[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[3] Alexandru Telea,et al. An Image Inpainting Technique Based on the Fast Marching Method , 2004, J. Graphics, GPU, & Game Tools.

[4] Antonio,et al. ImageCLEF 2020: Multimedia Retrieval in Lifelogging, Medical, Nature, and Internet Applications , 2020, ECIR.

[5] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[7] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.

[9] Henning Müller,et al. VQA-Med: Overview of the Medical Visual Question Answering Task at ImageCLEF 2019 , 2019, CLEF.

[10] Abien Fred Agarap. Deep Learning using Rectified Linear Units (ReLU) , 2018, ArXiv.

[11] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13] Henning Müller,et al. Overview of the VQA-Med Task at ImageCLEF 2021: Visual Question Answering and Generation in the Medical Domain , 2020, CLEF.

[14] Lin Li,et al. Zhejiang University at ImageCLEF 2019 Visual Question Answering in the Medical Domain , 2019, CLEF.

[15] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.