Task-based evaluation of context-sensitive referring expressions in human–robot dialogue

The standard referring-expression generation task involves creating stand-alone descriptions intended solely to distinguish a target object from its context. However, when an artificial system refers to objects in the course of interactive, embodied dialogue with a human partner, this is a very different setting; the references found in situated dialogue are able to take into account the aspects of the physical, interactive and task-level context, and are therefore unlike those found in corpora of stand-alone references. Also, the dominant method of evaluating generated references involves measuring corpus similarity. In an interactive context, though, other extrinsic measures such as task success and user preference are more relevant – and numerous studies have repeatedly found little or no correlation between such extrinsic metrics and the predictions of commonly used corpus-similarity metrics. To explore these issues, we introduce a humanoid robot designed to cooperate with a human partner on a joint construction task. We then describe the context-sensitive reference-generation algorithm that was implemented for use on this robot, which was inspired by the referring phenomena found in the Joint Construction Task corpus of human–human joint construction dialogues. The context-sensitive algorithm was evaluated through two user studies comparing it to a baseline algorithm, using a combination of objective performance measures and subjective user satisfaction scores. In both studies, the objective task performance and dialogue quality were found to be the same for both versions of the system; however, in both cases, the context-sensitive system scored more highly on subjective measures of interaction quality.

[1]  Ellen Campana,et al.  Natural discourse reference generation reduces cognitive load in spoken systems , 2010, Natural Language Engineering.

[2]  Amy Isard,et al.  Multi-lingual Evaluation of a Natural Language Generation System , 2004, LREC.

[3]  Markus Guhe,et al.  Adapting referring expressions to the task environment , 2008 .

[4]  John D. Kelleher,et al.  Incremental Generation of Spatial Referring Expressions in Situated Dialog , 2006, ACL.

[5]  Michael White,et al.  Efficient Realization of Coordinate Structures in Combinatory Categorial Grammar , 2006 .

[6]  J. Oberlander,et al.  Using Facial Feedback to Enhance Turn-Taking in a Multimodal Dialogue System , 2005 .

[7]  Alois Knoll,et al.  MultiML: a general purpose representation language for multimodal human utterances , 2008, ICMI '08.

[8]  Jeanette K. Gundel,et al.  Cognitive Status and the form of Referring Expressions in Discourse , 1993, The Oxford Handbook of Reference.

[9]  Dana Kulic,et al.  Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots , 2009, Int. J. Soc. Robotics.

[10]  Matthew P. Aylett,et al.  Referential form, word duration, and modelling the listener in spoken dialogue , 2004 .

[11]  Ipke Wachsmuth,et al.  Incremental Generation of Multimodal Deixis Referring to Objects , 2005, ENLG.

[12]  Jakob Nielsen,et al.  Measuring usability: preference vs. performance , 1994, CACM.

[13]  Emiel Krahmer,et al.  Computational Generation of Referring Expressions: A Survey , 2012, CL.

[14]  Mira Ariel The function of accessibility in a theory of grammar , 1991 .

[15]  Michael White,et al.  Reining in CCG Chart Realization , 2004, INLG.

[16]  Kim Binsted,et al.  Children's evaluation of computer-generated punning riddles , 1997 .

[17]  Johanna D. Moore,et al.  Report on the Second NLG Challenge on Generating Instructions in Virtual Environments (GIVE-2) , 2010, INLG.

[18]  Robert Dale,et al.  The Impact of Visual Context on the Content of Referring Expressions , 2011, ENLG.

[19]  Kees van Deemter,et al.  Information sharing : reference and presupposition in language generation and interpretation , 2002 .

[20]  Simon Garrod,et al.  Joint Action, Interactive Alignment, and Dialog , 2009, Top. Cogn. Sci..

[21]  Harry Bunt,et al.  Multimodal referece. Studies in automatic generation of multimodal referring expressions , 2000 .

[22]  Mary Ellen Foster Automated Metrics That Agree With Human Judgements On Generated Output for an Embodied Conversational Agent , 2008, INLG.

[23]  Shimei Pan,et al.  Designing and Evaluating an Adaptive Spoken Dialogue System , 2002, User Modeling and User-Adapted Interaction.

[24]  Chris Mellish,et al.  Evaluation in the context of natural language generation , 1998, Comput. Speech Lang..

[25]  Anja Belz,et al.  Comparing Automatic and Human Evaluation of NLG Systems , 2006, EACL.

[26]  Jean Carletta,et al.  Eyetracking for two-person tasks with manipulation of a virtual world , 2010, Behavior research methods.

[27]  Albert Gatt,et al.  Generating Referring Expressions in Context: The GREC Task Evaluation Challenges , 2010, Empirical Methods in Natural Language Generation.

[28]  Robin L. Hill,et al.  Tuning accessibility of referring expressions in situated dialogue , 2014 .

[29]  Matthew W. Crocker,et al.  Producing and resolving multi-modal referring expressions in human-robot interaction , 2009 .

[30]  Estela Bicho,et al.  Neuro-cognitive mechanisms of decision making in joint action: a human-robot interaction study. , 2011, Human movement science.

[31]  M. Tanenhaus,et al.  Approaches to studying world-situated language use : bridging the language-as-product and language-as-action traditions , 2005 .

[32]  Staffan Larsson,et al.  Information state and dialogue management in the TRINDI dialogue move engine toolkit , 2000, Natural Language Engineering.

[33]  Alois Knoll,et al.  The roles of haptic-ostensive referring expressions in cooperative, task-based human-robot dialogue , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[34]  Takenobu Tokunaga,et al.  A task-performance evaluation of referring expressions in situated collaborative task dialogues , 2013, Language Resources and Evaluation.

[35]  Amy Isard,et al.  Evaluating Description and Reference Strategies in a Cooperative Human-Robot Dialogue System , 2009, IJCAI.

[36]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[37]  Ielka van der Sluis,et al.  Building a Semantically Transparent Corpus for the Generation of Referring Expressions. , 2006, INLG.

[38]  Alois Knoll,et al.  Comparing Objective and Subjective Measures of Usability in a Human-Robot Dialogue System , 2009, ACL.

[39]  Robin L. Hill,et al.  Who tunes accessibility of referring expressions in task-related dialogue? , 2008 .

[40]  Xue Yan,et al.  iCat: an animated user-interface robot with personality , 2005, AAMAS '05.

[41]  Albert Gatt,et al.  Introducing Shared Tasks to NLG: The TUNA Shared Task Evaluation Challenges , 2010, Empirical Methods in Natural Language Generation.

[42]  B. Keysar,et al.  When do speakers take into account common ground? , 1996, Cognition.

[43]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[44]  H. H. Clark,et al.  Changing Ideas about Reference , 2004 .

[45]  Takenobu Tokunaga,et al.  The REX corpora: A collection of multimodal corpora of referring expressions in collaborative problem solving dialogues , 2012, LREC.

[46]  H. H. Clark,et al.  Referring as a collaborative process , 1986, Cognition.

[47]  M. Aalbers,et al.  Amsterdam, Netherlands , 2018, The Statesman’s Yearbook Companion.

[48]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[49]  Emiel Krahmer,et al.  Efficient context-sensitive generation of referring expressions , 2002 .

[50]  Amy Isard,et al.  Situated Reference in a Hybrid Human-Robot Interaction System , 2010, INLG.

[51]  Siobhan Chapman Logic and Conversation , 2005 .

[52]  Robert Dale,et al.  Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions , 1995, Cogn. Sci..

[53]  Ehud Reiter Task-Based Evaluation of NLG Systems: Control vs Real-World Context , 2011 .

[54]  Michael Argyle,et al.  The central Europe experiment: Looking at persons and looking at objects , 1976 .

[55]  Sharon L. Oviatt,et al.  Ten myths of multimodal interaction , 1999, Commun. ACM.

[56]  Ipke Wachsmuth,et al.  Deictic object reference in task-oriented dialogue , 2006 .

[57]  Emiel Krahmer,et al.  What Computational Linguists Can Learn from Psychologists (and Vice Versa) , 2010, Computational Linguistics.

[58]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[59]  Johanna D. Moore,et al.  Generating and evaluating evaluative arguments , 2006, Artif. Intell..

[60]  Jeanette K. Gundel,et al.  Cognitive Status and the Form of Referring Expressions in Discourse , 1993 .

[61]  Ielka van der Sluis,et al.  Generation of Referring Expressions: Assessing the Incremental Algorithm , 2012, Cogn. Sci..

[62]  Matthew W. Crocker,et al.  Enhancing Referential Success by Tracking Hearer Gaze , 2012, SIGDIAL Conference.

[63]  Ielka van der Sluis,et al.  A Cross-Linguistic Study on the Production of Multimodal Referring Expressions in Dialogue , 2011, ENLG.

[64]  Estela Bicho,et al.  Integrating Verbal and Nonverbal Communication in a Dynamic Neural Field Architecture for Human–Robot Interaction , 2010, Front. Neurorobot..