Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation

Answerer in Questioner's Mind (AQM) is an information-theoretic framework that has been recently proposed for task-oriented dialog systems. AQM benefits from asking a question that would maximize the information gain when it is asked. However, due to its intrinsic nature of explicitly calculating the information gain, AQM has a limitation when the solution space is very large. To address this, we propose AQM+ that can deal with a large-scale problem and ask a question that is more coherent to the current context of the dialog. We evaluate our method on GuessWhich, a challenging task-oriented visual dialog problem, where the number of candidate classes is near 10K. Our experimental results and ablation studies show that AQM+ outperforms the state-of-the-art models by a remarkable margin with a reasonable approximation. In particular, the proposed AQM+ reduces more than 60% of error as the dialog proceeds, while the comparative algorithms diminish the error by less than 6%. Based on our results, we argue that AQM+ is a general task-oriented dialog algorithm that can be applied for non-yes-or-no responses.

[1]  Jianfeng Gao,et al.  End-to-End Task-Completion Neural Dialogue Systems , 2017, IJCNLP.

[2]  Bohyung Han,et al.  Visual Reference Resolution using Attention Memory for Visual Dialog , 2017, NIPS.

[3]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[4]  Zhou Yu,et al.  Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog , 2018, SIGDIAL Conference.

[5]  Ali Farhadi,et al.  Neural Speed Reading via Skim-RNN , 2017, ICLR.

[6]  Maxine Eskénazi,et al.  Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning , 2016, SIGDIAL Conference.

[7]  Stefan Lee,et al.  Evaluating Visual Conversational Agents via Cooperative Human-AI Games , 2017, HCOMP.

[8]  Julien Perez,et al.  Gated End-to-End Memory Networks , 2016, EACL.

[9]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[10]  Svetlana Lazebnik,et al.  Two Can Play This Game: Visual Dialog with Discriminative Question Generation and Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Qi Wu,et al.  Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards , 2017, ArXiv.

[12]  Christopher D. Manning,et al.  A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue , 2017, EACL.

[13]  Hal Daumé,et al.  Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information , 2018, ACL.

[14]  Steve Renals,et al.  Dynamic Evaluation of Neural Sequence Models , 2017, ICML.

[15]  Dimitris N. Metaxas,et al.  Interactive Reinforcement Learning for Object Grounding via Self-Talking , 2017, ArXiv.

[16]  Kallirroi Georgila,et al.  An ISU Dialogue System Exhibiting Reinforcement Learning of Dialogue Policies: Generic Slot-Filling in the TALK In-car System , 2006, EACL.

[17]  José M. F. Moura,et al.  Visual Coreference Resolution in Visual Dialog using Neural Module Networks , 2018, ECCV.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Jung-Woo Ha,et al.  NSML: Meet the MLaaS platform with a real-world case study , 2018, ArXiv.

[20]  Yuandong Tian,et al.  CoDraw: Visual Dialog for Collaborative Drawing , 2017, ArXiv.

[21]  Jiasen Lu,et al.  Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model , 2017, NIPS.

[22]  Hugo Larochelle,et al.  GuessWhat?! Visual Object Discovery through Multi-modal Dialogue , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Gongshen Liu,et al.  Sliced Recurrent Neural Networks , 2018, COLING.

[24]  Daniel Jurafsky,et al.  A Simple, Fast Diverse Decoding Algorithm for Neural Generation , 2016, ArXiv.

[25]  Maxine Eskénazi,et al.  Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability , 2017, SIGDIAL Conference.

[26]  Quoc V. Le,et al.  QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.

[27]  Maxine Eskénazi,et al.  Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation , 2018, ACL.

[28]  Xinlei Chen,et al.  CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication , 2017, ACL.

[29]  Byoung-Tak Zhang,et al.  Criteria for Human-Compatible AI in Two-Player Vision-Language Tasks , 2017, LaCATODA@IJCAI.

[30]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[31]  Jannis Bulian,et al.  Ask the Right Questions: Active Question Reformulation with Reinforcement Learning , 2017, ICLR.

[32]  Jung-Woo Ha,et al.  NSML: A Machine Learning Platform That Enables You to Focus on Your Models , 2017, ArXiv.

[33]  José M. F. Moura,et al.  Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[34]  Mitesh M. Khapra,et al.  Graph Convolutional Network with Sequential Attention for Goal-Oriented Dialogue Systems , 2019, Transactions of the Association for Computational Linguistics.

[35]  Ali Farhadi,et al.  Query-Reduction Networks for Question Answering , 2016, ICLR.

[36]  Stefan Lee,et al.  Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[38]  Yu-Jung Heo,et al.  Answerer in Questioner's Mind for Goal-Oriented Visual Dialogue , 2018, ArXiv.

[39]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[40]  Olivier Pietquin,et al.  End-to-end optimization of goal-driven and visually grounded dialogue systems , 2017, IJCAI.

[41]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[42]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models , 2016, ArXiv.