论文信息 - Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog

Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog

Goal-oriented dialog has been given attention due to its numerous applications in artificial intelligence. Goal-oriented dialogue tasks occur when a questioner asks an action-oriented question and an answerer responds with the intent of letting the questioner know a correct action to take. To ask the adequate question, deep learning and reinforcement learning have been recently applied. However, these approaches struggle to find a competent recurrent neural questioner, owing to the complexity of learning a series of sentences. Motivated by theory of mind, we propose "Answerer in Questioner's Mind" (AQM), a novel information theoretic algorithm for goal-oriented dialog. With AQM, a questioner asks and infers based on an approximated probabilistic model of the answerer. The questioner figures out the answerer’s intention via selecting a plausible question by explicitly calculating the information gain of the candidate intentions and possible answers to each question. We test our framework on two goal-oriented visual dialog tasks: "MNIST Counting Dialog" and "GuessWhat?!". In our experiments, AQM outperforms comparative algorithms by a large margin.

[1] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Steve J. Young,et al. Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[3] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Dan Klein,et al. Reasoning about Pragmatics with Neural Listeners and Speakers , 2016, EMNLP.

[5] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[6] Jianfeng Gao,et al. End-to-End Task-Completion Neural Dialogue Systems , 2017, IJCNLP.

[7] Pieter Abbeel,et al. Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[8] Byoung-Tak Zhang,et al. Information-Theoretic Objective Functions for Lifelong Learning , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[9] Hugo Larochelle,et al. GuessWhat?! Visual Object Discovery through Multi-modal Dialogue , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Darren Newtson,et al. The Structure of Action and Interaction , 1987 .

[11] Stephen Clark,et al. Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input , 2018, ICLR.

[12] Yuandong Tian,et al. CoDraw: Visual Dialog for Collaborative Drawing , 2017, ArXiv.

[13] Quoc V. Le,et al. A Neural Conversational Model , 2015, ArXiv.

[14] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[15] Alex Kendall,et al. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[16] M. Tomasello,et al. Does the chimpanzee have a theory of mind? 30 years later , 2008, Trends in Cognitive Sciences.

[17] Kallirroi Georgila,et al. An ISU Dialogue System Exhibiting Reinforcement Learning of Dialogue Policies: Generic Slot-Filling in the TALK In-car System , 2006, EACL.

[18] Maxine Eskénazi,et al. Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning , 2016, SIGDIAL Conference.

[19] Joelle Pineau,et al. Hierarchical Neural Network Generative Models for Movie Dialogues , 2015, ArXiv.

[20] Olivier Pietquin,et al. End-to-end optimization of goal-driven and visually grounded dialogue systems , 2017, IJCAI.

[21] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[22] José M. F. Moura,et al. Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[23] Kyunghyun Cho,et al. Emergent Language in a Multi-Modal, Multi-Step Referential Game , 2017, ArXiv.

[24] Stefan Lee,et al. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25] Xinlei Chen,et al. CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication , 2017, ACL.

[26] Byoung-Tak Zhang,et al. Criteria for Human-Compatible AI in Two-Player Vision-Language Tasks , 2017, LaCATODA@IJCAI.

[27] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[28] Todd M. Gureckis,et al. Question Asking as Program Generation , 2017, NIPS.

[29] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[30] Christopher Potts,et al. Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding , 2017, TACL.

[31] Joelle Pineau,et al. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[32] Jason Weston,et al. Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[33] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34] Licheng Yu,et al. A Joint Speaker-Listener-Reinforcer Model for Referring Expressions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.

[36] Dan Klein,et al. Unified Pragmatic Models for Generating and Following Instructions , 2017, NAACL.

[37] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] David Vandyke,et al. A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[39] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[40] Bohyung Han,et al. Visual Reference Resolution using Attention Memory for Visual Dialog , 2017, NIPS.

[41] Joseph Polifroni,et al. Learning Database Content for Spoken Dialogue System Design , 2006, LREC.

[42] Pablo Hernandez-Leal,et al. Learning against sequential opponents in repeated stochastic games , 2017 .

[43] Stefan Lee,et al. Evaluating Visual Conversational Agents via Cooperative Human-AI Games , 2017, HCOMP.

[44] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[45] M. Studdert-Kennedy,et al. Approaches to the Evolution of Language , 1999 .

[46] Devi Parikh,et al. It Takes Two to Tango: Towards Theory of AI's Mind , 2017, ArXiv.