Turn-taking, feedback and joint attention in situated human-robot interaction

Abstract In this paper, we present a study where a robot instructs a human on how to draw a route on a map. The human and robot are seated face-to-face with the map placed on the table between them. The user’s and the robot’s gaze can thus serve several simultaneous functions: as cues to joint attention, turn-taking, level of understanding and task progression. We have compared this face-to-face setting with a setting where the robot employs a random gaze behaviour, as well as a voice-only setting where the robot is hidden behind a paper board. In addition to this, we have also manipulated turn-taking cues such as completeness and filled pauses in the robot’s speech. By analysing the participants’ subjective rating, task completion, verbal responses, gaze behaviour, and drawing activity, we show that the users indeed benefit from the robot’s gaze when talking about landmarks, and that the robot’s verbal and gaze behaviour has a strong effect on the users’ turn-taking behaviour. We also present an analysis of the users’ gaze and lexical and prosodic realisation of feedback after the robot instructions, and show that these cues reveal whether the user has yet executed the previous instruction, as well as the user’s level of uncertainty.

[1]  Jens Edlund,et al.  The Effect of Prosodic Features on the Interpretation of Synthesised Backchannels , 2006, PIT.

[2]  J. Bavelas,et al.  Listener Responses as a Collaborative Process: The Role of Gaze , 2002 .

[3]  H. H. Clark,et al.  Understanding by addressees and overhearers , 1989, Cognitive Psychology.

[4]  Julia Hirschberg,et al.  Turn-taking cues in task-oriented dialogue , 2011, Comput. Speech Lang..

[5]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[6]  Gabriel Skantze,et al.  Head Pose Patterns in Multiparty Human-Robot Team-Building Interactions , 2013, ICSR.

[7]  Nigel Ward,et al.  A study in responsiveness in spoken dialog , 2003, Int. J. Hum. Comput. Stud..

[8]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[9]  Louis-Philippe Morency,et al.  A probabilistic multimodal approach for predicting listener backchannels , 2009, Autonomous Agents and Multi-Agent Systems.

[10]  Anna Hjalmarsson,et al.  The additive effect of turn-taking cues in human and synthetic voice , 2011, Speech Commun..

[11]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[12]  Joakim Gustafson,et al.  Walk This Way: Spatial Grounding for City Exploration , 2014, Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice.

[13]  Justus J. Randolph Free-Marginal Multirater Kappa (multirater K[free]): An Alternative to Fleiss' Fixed-Marginal Multirater Kappa. , 2005 .

[14]  Gabriel Skantze,et al.  A General, Abstract Model of Incremental Dialogue Processing , 2009, EACL.

[15]  Gabriel Skantze,et al.  Exploring the effects of gaze and pauses in situated human-robot interaction , 2013, SIGDIAL Conference.

[16]  Stefan Kopp,et al.  Combining Incremental Language Generation and Incremental Speech Synthesis for Adaptive Information Presentation , 2012, SIGDIAL Conference.

[17]  B. Velichkovsky Communicating attention: Gaze position transfer in cooperative problem solving , 1995 .

[18]  Gabriel Skantze,et al.  Attention and Interaction Control in a Human-Human-Computer Dialogue Setting , 2009, SIGDIAL Conference.

[19]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[20]  Joakim Gustafson,et al.  Cues to perceived functions of acted and spontaneous feedback expressions , 2012 .

[21]  A. Anderson,et al.  The Effects of Visibility on Dialogue and Performance in a Cooperative Problem Solving Task , 1994 .

[22]  Catharine Oertel,et al.  Gaze direction as a Back-Channel inviting Cue in Dialogue , 2012 .

[23]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[24]  Yukiko I. Nakano,et al.  Towards a Model of Face-to-Face Grounding , 2003, ACL.

[25]  Johan Boye Dialogue Management for Automatic Troubleshooting and other Problem-solving Applications , 2007, SIGdial.

[26]  Gabriel Skantze,et al.  A Data-driven Model for Timing Feedback in a Map Task Dialogue System , 2013, SIGDIAL Conference.

[27]  Stefan Kopp,et al.  Towards Conversational Agents That Attend to and Adapt to Communicative User Feedback , 2011, IVA.

[28]  Peter Ford Dominey,et al.  I Reach Faster When I See You Look: Gaze Effects in Human–Human and Human–Robot Face-to-Face Cooperation , 2012, Front. Neurorobot..

[29]  J. Beskow,et al.  MushyPeek: A Framework for Online Investigation of Audiovisual Dialogue Phenomena , 2009, Language and speech.

[30]  Louis-Philippe Morency,et al.  Virtual Rapport 2.0 , 2011, IVA.

[31]  David Schlangen,et al.  Interpreting Situated Dialogue Utterances: an Update Model that Uses Speech, Gaze, and Gesture Information , 2013, SIGDIAL Conference.

[32]  A. Kendon Some functions of gaze-direction in social interaction. , 1967, Acta psychologica.

[33]  Catherine Lai,et al.  What do you mean, you're uncertain?: the interpretation of cue words and rising intonation in dialogue , 2010, INTERSPEECH.

[34]  Heather Pon-Barry,et al.  Prosodic manifestations of confidence and uncertainty in spoken language , 2008, INTERSPEECH.

[35]  S. Baron-Cohen The Eye Direction Detector (EDD) and the Shared Attention Mechanism (SAM): Two cases for evolutionar , 1995 .

[36]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[37]  Gabriel Skantze,et al.  Incremental Dialogue Processing in a Micro-Domain , 2009, EACL.

[38]  Gabriel Skantze,et al.  User feedback in human-robot interaction: prosody, gaze and timing , 2013, INTERSPEECH.

[39]  Gabriel Skantze,et al.  The furhat Back-Projected humanoid Head-Lip Reading, gaze and Multi-Party Interaction , 2013, Int. J. Humanoid Robotics.

[40]  Stefan Kopp,et al.  Synthesis of prosodic attitudinal variants in German backchannel ja , 2007, INTERSPEECH.

[41]  Gabriel Skantze,et al.  A Testbed for Examining the Timing of Feedback using a Map Task , 2012 .

[42]  Anton Nijholt,et al.  Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes , 2001, CHI.

[43]  Bilge Mutlu,et al.  A Storytelling Robot: Modeling and Evaluation of Human-like Gaze Behavior , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[44]  Eric Horvitz,et al.  Facilitating multiparty dialog with gaze, gesture, and speech , 2010, ICMI-MLMI '10.

[45]  Joakim Nivre,et al.  On the Semantics and Pragmatics of Linguistic Feedback , 1992, J. Semant..

[46]  Julia Hirschberg,et al.  Detecting question-bearing turns in spoken tutorial dialogues , 2006, INTERSPEECH.

[47]  M. Crocker,et al.  Investigating joint attention mechanisms through spoken human–robot interaction , 2011, Cognition.

[48]  Jean Carletta,et al.  A shallow model of backchannel continuers in spoken dialogue , 2003 .

[49]  付伶俐 打磨Using Language,倡导新理念 , 2014 .

[50]  Tanja Schultz,et al.  Identifying the addressee in human-human-robot interactions based on head pose and speech , 2004, ICMI '04.

[51]  Gabriel Skantze,et al.  IrisTK: a statechart-based toolkit for multi-party face-to-face interaction , 2012, ICMI '12.

[52]  Paul D. Allopenna,et al.  Tracking the Time Course of Spoken Word Recognition Using Eye Movements: Evidence for Continuous Mapping Models , 1998 .

[53]  Mattias Heldner,et al.  Pauses, gaps and overlaps in conversations , 2010, J. Phonetics.

[54]  T. Kanda,et al.  Infants understand the referential nature of human gaze but not robot gaze. , 2013, Journal of experimental child psychology.

[55]  A. Ichikawa,et al.  An Analysis of Turn-Taking and Backchannels Based on Prosodic and Syntactic Features in Japanese Map Task Dialogs , 1998, Language and speech.

[56]  Gabriel Skantze,et al.  Towards incremental speech generation in conversational systems , 2013, Comput. Speech Lang..

[57]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[58]  Diane J. Litman,et al.  Benefits and challenges of real-time uncertainty detection and adaptation in a spoken dialogue computer tutor , 2011, Speech Commun..

[59]  H. H. Clark,et al.  Speaking while monitoring addressees for understanding , 2004 .

[60]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[61]  W. Nigel,et al.  Pragmatic functions of prosodic features in non-lexical utterances , 2004, Speech Prosody 2004.

[62]  V. Yngve On getting a word in edgewise , 1970 .

[63]  Petra Wagner,et al.  Gaze Patterns in Turn-Taking , 2012, INTERSPEECH.

[64]  E. Schegloff Discourse as an interactional achievement : Some uses of "Uh huh" and other things that come between sentences , 1982 .

[65]  Anne H. Anderson,et al.  The Hcrc Map Task Corpus , 1991 .

[66]  James F. Allen,et al.  Draft of DAMSL Dialog Act Markup in Several Layers , 2007 .

[67]  Dennis Reidsma,et al.  Continuous interaction with a virtual human , 2011, Journal on Multimodal User Interfaces.

[68]  Nigel Ward,et al.  Pacing spoken directions to suit the listener , 1998, ICSLP.