Grounded Agreement Games: Emphasizing Conversational Grounding in Visual Dialogue Settings

Where early work on dialogue in Computational Linguistics put much emphasis on dialogue structure and its relation to the mental states of the dialogue participants (e.g., Allen 1979, Grosz & Sidner 1986), current work mostly reduces dialogue to the task of producing at any one time a next utterance; e.g. in neural chatbot or Visual Dialogue settings. As a methodological decision, this is sound: Even the longest journey is a sequence of steps. It becomes detrimental, however, when the tasks and datasets from which dialogue behaviour is to be learned are tailored too much to this framing of the problem. In this short note, we describe a family of settings which still allow to keep dialogues simple, but add a constraint that makes participants care about reaching mutual understanding. In such agreement games, there is a secondary, but explicit goal besides the task level goal, and that is to reach mutual understanding about whether the task level goal has been reached. As we argue, this naturally triggers meta-semantic interaction and mutual engagement, and hence leads to richer data from which to induce models.

[1]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[2]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[3]  Hugo Larochelle,et al.  GuessWhat?! Visual Object Discovery through Multi-modal Dialogue , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  付伶俐 打磨Using Language,倡导新理念 , 2014 .

[5]  Colin Potts,et al.  Design of Everyday Things , 1988 .

[6]  David Schlangen,et al.  MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment , 2019, ArXiv.

[7]  David Schlangen,et al.  Natural Language Semantics With Pictures: Some Language & Vision Datasets and Potential Uses for Computational Semantics , 2019, IWCS.

[8]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[9]  Akiko Aizawa,et al.  A Natural Language Corpus of Common Grounding under Continuous and Partially-Observable Context , 2019, AAAI.

[10]  M. Hayashi,et al.  Conversational Repair and Human Understanding: Contents , 2013 .

[11]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[12]  C. Raymond Perrault,et al.  Plans, Inference, and Indirect Speech Acts , 1979, ACL.

[13]  E. Schegloff Sequence Organization In Interaction , 2007 .

[14]  José M. F. Moura,et al.  Visual Dialog , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Stefan Lee,et al.  Evaluating Visual Conversational Agents via Cooperative Human-AI Games , 2017, HCOMP.

[16]  David Traum,et al.  Computational Models of Grounding in Collaborative Systems , 1999 .

[17]  David DeVault,et al.  PentoRef: A Corpus of Spoken References in Task-oriented Dialogues , 2016, LREC.

[18]  José M. F. Moura,et al.  Visual Coreference Resolution in Visual Dialog using Neural Module Networks , 2018, ECCV.

[19]  Gabriel Skantze,et al.  Exploring human error recovery strategies: Implications for spoken dialogue systems , 2005, Speech Communication.

[20]  Elia Bruni,et al.  The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue , 2019, ACL.

[21]  Martin Heckmann,et al.  From Explainability to Explanation: Using a Dialogue Setting to Elicit Annotations with Justifications , 2019, SIGdial.