TeamS at VQA-Med 2021: BBN-Orchestra for Long-tailed Medical Visual Question Answering

This work describes our (TeamS) participation in the Medical Domain Visual Question Answering challenge (VQA-Med) at ImageCLEF 2021. We translate the VQA problem to long-tailed multi-class image classification for categorizing abnormalities present in medical images. Our proposed BBN-Orchestra is an ensemble of bilateral-branch networks (BBN) and successfully reduces overfitting to train and validation data in addition to effectively modeling the imbalanced long-tailed image distribution. BBNOrchestra employs a voting mechanism to assign final predicted classes in the inference phase. Our proposed method achieved a test accuracy of 34.8% and a BLEU score of 39.1%, ranking 3 in the competition. Our source code is available at https://github.com/d4l-data4life/BBNOrchestra-for-VQAmed2021.

[1]  Asma Ben Abacha,et al.  Descriptor : A dataset of clinically generated visual questions and answers about radiology images , 2018 .

[2]  Tieyun Qian,et al.  Recommending Accurate and Diverse Items Using Bilateral Branch Network , 2021, ArXiv.

[3]  Bo Liu,et al.  Medical Visual Question Answering via Conditional Reasoning , 2020, ACM Multimedia.

[4]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.

[5]  Zhibin Liao,et al.  AIML at VQA-Med 2020: Knowledge Inference via a Skeleton-based Sentence Mapping Approach for Medical Domain Visual Question Answering , 2020, CLEF.

[6]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[7]  Li Lei,et al.  A deep residual networks classification algorithm of fetal heart CT images , 2018, 2018 IEEE International Conference on Imaging Systems and Techniques (IST).

[8]  Guanbin Li,et al.  HCP-MIC at VQA-Med 2020: Effective Visual Representation for Medical Visual Question Answering , 2020, CLEF.

[9]  Bo Liu,et al.  Slake: A Semantically-Labeled Knowledge-Enhanced Dataset For Medical Visual Question Answering , 2021, 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI).

[10]  Xiu-Shen Wei,et al.  BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Henning Müller,et al.  Overview of the VQA-Med Task at ImageCLEF 2021: Visual Question Answering and Generation in the Medical Domain , 2020, CLEF.

[12]  Thanh-Toan Do,et al.  Overcoming Data Limitation in Medical Visual Question Answering , 2019, MICCAI.

[13]  Raphael Sznitman,et al.  A Question-Centric Model for Visual Question Answering in Medical Imaging , 2020, IEEE Transactions on Medical Imaging.