Diversity Learning Based on Multi-Latent Space for Medical Image Visual Question Generation

Auxiliary clinical diagnosis has been researched to solve unevenly and insufficiently distributed clinical resources. However, auxiliary diagnosis is still dominated by human physicians, and how to make intelligent systems more involved in the diagnosis process is gradually becoming a concern. An interactive automated clinical diagnosis with a question-answering system and a question generation system can capture a patient’s conditions from multiple perspectives with less physician involvement by asking different questions to drive and guide the diagnosis. This clinical diagnosis process requires diverse information to evaluate a patient from different perspectives to obtain an accurate diagnosis. Recently proposed medical question generation systems have not considered diversity. Thus, we propose a diversity learning-based visual question generation model using a multi-latent space to generate informative question sets from medical images. The proposed method generates various questions by embedding visual and language information in different latent spaces, whose diversity is trained by our newly proposed loss. We have also added control over the categories of generated questions, making the generated questions directional. Furthermore, we use a new metric named similarity to accurately evaluate the proposed model’s performance. The experimental results on the Slake and VQA-RAD datasets demonstrate that the proposed method can generate questions with diverse information. Our model works with an answering model for interactive automated clinical diagnosis and generates datasets to replace the process of annotation that incurs huge labor costs.

[1]  R. Lovas,et al.  Uniformity Correction of CMOS Image Sensor Modules for Machine Vision Cameras , 2022, Italian National Conference on Sensors.

[2]  Chaofan Zhang,et al.  Object-Based Reliable Visual Navigation for Mobile Robot , 2022, Sensors.

[3]  Yu Xia,et al.  An Intravascular Catheter Bending Recognition Method for Interventional Surgical Robots , 2022, Machines.

[4]  Jianfeng Gao,et al.  Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..

[5]  Miki Haseyama,et al.  Database-adaptive Re-ranking for Enhancing Cross-modal Image Retrieval , 2021, ACM Multimedia.

[6]  Lirong Yin,et al.  Joint embedding VQA model based on dynamic word vector , 2021, PeerJ Comput. Sci..

[7]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[8]  Vishal M. Patel,et al.  Medical Transformer: Gated Axial-Attention for Medical Image Segmentation , 2021, MICCAI.

[9]  Bo Liu,et al.  Slake: A Semantically-Labeled Knowledge-Enhanced Dataset For Medical Visual Question Answering , 2021, 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI).

[10]  Andreas Nürnberger,et al.  CHAOS Challenge - Combined (CT-MR) Healthy Abdominal Organ Segmentation , 2020, Medical Image Anal..

[11]  Bogdan Ionescu,et al.  Overview of the ImageCLEF 2021: Multimedia Retrieval in Medical, Nature, Internet and Social Media Applications , 2021, CLEF.

[12]  Shan Liu,et al.  Knowledge base graph embedding module design for Visual question answering model , 2021, Pattern Recognit..

[13]  Gerard de Melo,et al.  TeamS at VQA-Med 2021: BBN-Orchestra for Long-tailed Medical Visual Question Answering , 2021, Conference and Labs of the Evaluation Forum.

[14]  Bo Yang,et al.  Improving Visual Reasoning Through Semantic Representation , 2021, IEEE Access.

[15]  Miki Haseyama,et al.  Estimation Of Visual Contents Based On Question Answering From Human Brain Activity , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[16]  Asma Ben Abacha,et al.  Visual Question Generation from Radiology Images , 2020, ALVR.

[17]  Yi Yu,et al.  C3VQG: category consistent cyclic visual question generation , 2020, MMAsia.

[18]  Di Zhao,et al.  A review of the application of deep learning in medical image classification and segmentation , 2020, Annals of translational medicine.

[19]  Mahmoud Al-Ayyoub,et al.  The Inception Team at VQA-Med 2020: Pretrained VGG with Data Augmentation for Medical VQA and VQG , 2020, CLEF.

[20]  Mourad Sarrouti,et al.  NLM at VQA-Med 2020: Visual Question Answering and Generation in the Medical Domain , 2020, CLEF.

[21]  Henning Müller,et al.  Overview of the VQA-Med Task at ImageCLEF 2021: Visual Question Answering and Generation in the Medical Domain , 2020, CLEF.

[22]  Ali Farhadi,et al.  OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Michael S. Bernstein,et al.  Information Maximizing Visual Question Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Christopher D. Manning,et al.  GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ronald M. Summers,et al.  A large annotated medical image dataset for the development and evaluation of segmentation algorithms , 2019, ArXiv.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Asma Ben Abacha,et al.  Descriptor : A dataset of clinically generated visual questions and answers about radiology images , 2018 .

[28]  Nan Hua,et al.  Universal Sentence Encoder for English , 2018, EMNLP.

[29]  Gunhee Kim,et al.  A Joint Sequence Fusion Model for Video Question Answering and Retrieval , 2018, ECCV.

[30]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[31]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[32]  Alexander G. Schwing,et al.  Creativity: Generating Diverse Questions Using Variational Autoencoders , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Said Ouatik El Alaoui,et al.  A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering , 2017, J. Biomed. Informatics.

[34]  Qi Wu,et al.  Visual question answering: A survey of methods and datasets , 2016, Comput. Vis. Image Underst..

[35]  Eka Miranda,et al.  A survey of medical image classification techniques , 2016, 2016 International Conference on Information Management and Technology (ICIMTech).

[36]  Philipp Koehn,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[37]  Margaret Mitchell,et al.  Generating Natural Questions About an Image , 2016, ACL.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yi Li,et al.  Neural Self Talk: Image Understanding via Continuous Questioning and Answering , 2015, ArXiv.

[41]  Bradley J Erickson,et al.  The effects of changes in utilization and technological advancements of cross-sectional imaging on radiologist workload. , 2015, Academic radiology.

[42]  Richard S. Zemel,et al.  Exploring Models and Data for Image Question Answering , 2015, NIPS.

[43]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[44]  Xinlei Chen,et al.  Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[45]  Donald Geman,et al.  Visual Turing test for computer vision systems , 2015, Proceedings of the National Academy of Sciences.

[46]  Muhammad Sharif,et al.  A Survey on Medical Image Segmentation , 2015 .

[47]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[49]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[50]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[51]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[52]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[53]  Kok-Kiong Yap,et al.  Investigating network architectures for body sensor networks , 2007, HealthNet '07.

[54]  Guido Gerig,et al.  User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability , 2006, NeuroImage.

[55]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[56]  Jean Carletta,et al.  Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , 2005, ACL 2005.

[57]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[58]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[59]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.