论文信息 - A Question Type Driven Framework to Diversify Visual Question Generation

A Question Type Driven Framework to Diversify Visual Question Generation

Visual question generation aims at asking questions about an image automatically. Existing research works on this topic usually generate a single question for each given image without considering the issue of diversity. In this paper, we propose a question type driven framework to produce multiple questions for a given image with different focuses. In our framework, each question is constructed following the guidance of a sampled question type in a sequence-to-sequence fashion. To diversify the generated questions, a novel conditional variational auto-encoder is introduced to generate multiple questions with a specific question type. Moreover, we design a strategy to conduct the question type distribution learning for each image to select the final questions. Experimental results on three benchmark datasets show that our framework outperforms the state-of-the-art approaches in terms of both relevance and diversity.

[1] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[2] Noah A. Smith,et al. Proceedings of NIPS , 2010, NIPS 2010.

[3] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[5] Jianfeng Gao,et al. Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation , 2017, IJCNLP.

[6] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[7] Alexander G. Schwing,et al. Creativity: Generating Diverse Questions Using Variational Autoencoders , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9] Rémi Eyraud,et al. Proceedings of CoNLL , 2006 .

[10] Ashwin K. Vijayakumar,et al. Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models , 2016, ArXiv.

[11] Shaodi You,et al. Automatic Generation of Grounded Visual Questions , 2016, IJCAI.

[12] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[13] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[14] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.

[15] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[16] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Piji Li,et al. Deep Recurrent Generative Decoder for Abstractive Text Summarization , 2017, EMNLP.

[18] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19] Tsukasa Hirashima,et al. Automated Question Generation Methods for Intelligent English Learning Systems and its Evaluation , 2001 .

[20] Michael S. Bernstein,et al. Visual7W: Grounded Question Answering in Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[22] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[23] Margaret Mitchell,et al. Generating Natural Questions About an Image , 2016, ACL.

[24] Noah A. Smith,et al. Proceedings of EMNLP , 2007 .