Decision-Theoretic Question Generation for Situated Reference Resolution: An Empirical Study and Computational Model

Dialogue agents that interact with humans in situated environments need to manage referential ambiguity across multiple modalities and ask for help as needed. However, it is not clear what kinds of questions such agents should ask nor how the answers to such questions can be used to resolve ambiguity. To address this, we analyzed dialogue data from an interactive study in which participants controlled a virtual robot tasked with organizing a set of tools while engaging in dialogue with a live, remote experimenter. We discovered a number of novel results, including the distribution of question types used to resolve ambiguity and the influence of dialogue-level factors on the reference resolution process. Based on these empirical findings we: (1) developed a computational model for clarification requests using a decision network with an entropy-based utility assignment method that operates across modalities, (2) evaluated the model, showing that it outperforms a slot-filling baseline in environments of varying ambiguity, and (3) interpreted the results to offer insight into the ways that agents can ask questions to facilitate situated reference resolution.

[1]  付伶俐 打磨Using Language,倡导新理念 , 2014 .

[2]  Plamen J. Prodanov,et al.  Decision Networks for Repair Strategies in Speech-Based Interaction with Mobile Tour-Guide Robots , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[3]  Stefanie Tellex,et al.  Clarifying commands with information-theoretic human-robot dialog , 2013, HRI 2013.

[4]  Ivana Kruijff-Korbayová,et al.  A Situated Context Model for Resolution and Generation of Referring Expressions , 2009, ENLG.

[5]  Scott Thomas,et al.  Using vision, acoustics, and natural language for disambiguation , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[6]  Matthias Scheutz,et al.  Situated open world reference resolution for human-robot dialogue , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[7]  Antonio Roque,et al.  Let’s do that first! A Comparative Analysis of Instruction-Giving in Human-Human and Human-Robot Situated Dialogue , 2020 .

[8]  Henrik I. Christensen,et al.  Situated Dialogue and Spatial Organization: What, Where… and Why? , 2007 .

[9]  Matthew R. Walter,et al.  Information Theoretic Question Asking to Improve Spatial Semantic Representations , 2014, AAAI Fall Symposia.

[10]  Gordon Briggs,et al.  How Should Agents Ask Questions For Situated Learning? An Annotated Dialogue Corpus , 2021, SIGDIAL.

[11]  Nikolaos Mavridis,et al.  A review of verbal and non-verbal human-robot interactive communication , 2014, Robotics Auton. Syst..

[12]  Pierre Lison Model-based Bayesian reinforcement learning for dialogue management , 2013, INTERSPEECH.

[13]  Gabriel Skantze,et al.  Exploring human error recovery strategies: Implications for spoken dialogue systems , 2005, Speech Communication.

[14]  Peter Stone,et al.  Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog , 2020, J. Artif. Intell. Res..

[15]  Johan Boye,et al.  SpaceRefNet: a neural approach to spatial reference resolution in a real city environment , 2019, SIGdial.

[16]  Jennifer Chu-Carroll Form-based reasoning for mixed-initiative dialogue management in information-query systems , 1999, EUROSPEECH.

[17]  Eric Horvitz,et al.  Conversation as Action Under Uncertainty , 2000, UAI.

[18]  Minoru Asada,et al.  Initiative in robot assistance during collaborative task execution , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[19]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[20]  David DeVault,et al.  Incremental Dialogue Understanding and Feedback for Multiparty, Multimodal Conversation , 2012, IVA.

[21]  Andrea Lockerd Thomaz,et al.  Towards Intelligent Arbitration of Diverse Active Learning Queries , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  David DeVault,et al.  An Information-State Approach to Collaborative Reference , 2005, ACL.

[23]  Satoshi Nakamura,et al.  Modeling Spoken Decision Making Dialogue and Optimization of its Dialogue Strategy , 2010, SIGDIAL Conference.

[24]  Takenobu Tokunaga,et al.  A Unified Probabilistic Approach to Referring Expressions , 2012, SIGDIAL Conference.

[25]  David Schlangen,et al.  Resolving References to Objects in Photographs using the Words-As-Classifiers Model , 2015, ACL.

[26]  Arthur C. Graesser,et al.  AutoTutor: A Cognitive System That Simulates a Tutor Through Mixed-Initiative Dialogue , 2006 .

[27]  Matthias Scheutz,et al.  Dempster-Shafer theoretic resolution of referential ambiguity , 2018, Auton. Robots.

[28]  Steven Yantis,et al.  How visual salience wins the battle for awareness , 2005, Nature Neuroscience.

[29]  Deb Roy,et al.  Probabilistic grounding of situated speech using plan recognition and reference resolution , 2005, ICMI '05.

[30]  H. H. Clark,et al.  Referring as a collaborative process , 1986, Cognition.

[31]  Changsong Liu,et al.  Modeling Collaborative Referring for Situated Referential Grounding , 2013, SIGDIAL Conference.

[32]  Manuel Lopes,et al.  Impact of Robot Initiative on Human-Robot Collaboration , 2017, HRI.

[33]  Teruhisa Misu,et al.  Visual Saliency and Crowdsourcing-based Priors for an In-car Situated Dialog System , 2015, ICMI.

[34]  Kheng Hui Yeo,et al.  Joint Generation and Bi-Encoder for Situated Interactive MultiModal Conversations , 2020 .

[35]  Paul A. Crook,et al.  Situated and Interactive Multimodal Conversations , 2020, COLING.

[36]  David Schlangen,et al.  A simple generative model of incremental reference resolution for situated dialogue , 2017, Comput. Speech Lang..

[37]  Satoshi Sato,et al.  Question Selection Based on Expected Utility to Acquire Information Through Dialogue , 2016, IWSDS.

[38]  Tim Paek,et al.  Toward a Taxonomy of Communication Errors , 2003 .

[39]  Maya Cakmak,et al.  Designing robot learners that ask good questions , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[40]  Claus Zinn,et al.  The Role of Initiative in Tutorial Dialogue , 2003, EACL.

[41]  Matthias Scheutz,et al.  Going Beyond Literal Command-Based Instructions: Extending Robotic Natural Language Interaction Capabilities , 2015, AAAI.

[42]  Diane Horton,et al.  Repairing conversational misunderstandings and non-understandings , 1994, Speech Communication.

[43]  Dorothea Kolossa,et al.  Speaker-adapted neural-network-based fusion for multimodal reference resolution , 2019, SIGdial.

[44]  Ivana Kruijff-Korbayová,et al.  A Portfolio Approach to Algorithm Selection , 2009, IJCAI.

[45]  Eric Horvitz,et al.  A computational architecture for conversation , 1999 .

[46]  J. Hulstijn Roles in Dialogue , 2003 .

[47]  Andrea Lockerd Thomaz,et al.  Robot Learning from Human Teachers , 2014, Robot Learning from Human Teachers.

[48]  Marilyn A. Walker,et al.  Mixed Initiative in Dialogue: An Investigation into Discourse Segmentation , 1990, ACL.

[49]  Matthias Scheutz,et al.  The reliability of non-verbal cues for situated reference resolution and their interplay with language: implications for human robot interaction , 2017, ICMI.