MoBVQA: A Modality based Medical Image Visual Question Answering System

This paper discusses about the work on medical image visual question answering done on the ImageCLEF 2019 medical VQA dataset. Visual question answering is a task where an image and a related question is given as input to the machine and we get a correct answer to the question as output. In our problem, both the input image and question are from medical domain. In medical imaging, VQA has applications like providing a second opinion to radiologists about their analysis of the image. It can also be used by the patients for getting a basic information about the image without consulting the doctor. We have considered the problem of answering modality based questions for medical images like X-ray, Computed Tomography(CT), ultra sound(US), magnetic resonance imaging(MRI) etc. The approach used here is to use a Convolutional Neural Network(CNN) to classify the input image to its modality class and thus generate the answer according to the CNN output. The proposed model shows a testing accuracy of 83.8% which is comparable with state of the art.