Exploring Turn-taking Cues in Multi-party Human-robot Discussions about Objects

In this paper, we present a dialog system that was exhibited at the Swedish National Museum of Science and Technology. Two visitors at a time could play a collaborative card sorting game together with the robot head Furhat, where the three players discuss the solution together. The cards are shown on a touch table between the players, thus constituting a target for joint attention. We describe how the system was implemented in order to manage turn-taking and attention to users and objects in the shared physical space. We also discuss how multi-modal redundancy (from speech, card movements and head pose) is exploited to maintain meaningful discussions, given that the system has to process conversational speech from both children and adults in a noisy environment. Finally, we present an analysis of 373 interactions, where we investigate the robustness of the system, to what extent the system's attention can shape the users' turn-taking behaviour, and how the system can produce multi-modal turn-taking signals (filled pauses, facial gestures, breath and gaze) to deal with processing delays in the system.

[1]  Gabriel Skantze,et al.  Head Pose Patterns in Multiparty Human-Robot Team-Building Interactions , 2013, ICSR.

[2]  H. H. Clark Coordinating with each other in a material world , 2005 .

[3]  Gabriel Skantze,et al.  Turn-taking, feedback and joint attention in situated human-robot interaction , 2014, Speech Commun..

[4]  Michael Argyle,et al.  The central Europe experiment: Looking at persons and looking at objects , 1976 .

[5]  Bajibabu Bollepalli,et al.  Tutoring Robots - Multiparty Multimodal Social Dialogue with an Embodied Tutor , 2013, eNTERFACE.

[6]  A. Kendon Some functions of gaze-direction in social interaction. , 1967, Acta psychologica.

[7]  Takayuki Kanda,et al.  Conversational gaze mechanisms for humanlike robots , 2012, TIIS.

[8]  Gabriel Skantze,et al.  Towards incremental speech generation in conversational systems , 2013, Comput. Speech Lang..

[9]  Jie Zhu,et al.  Head orientation and gaze direction in meetings , 2002, CHI Extended Abstracts.

[10]  David Harel,et al.  Statecharts: A Visual Formalism for Complex Systems , 1987, Sci. Comput. Program..

[11]  Anton Nijholt,et al.  Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes , 2001, CHI.

[12]  Eric Horvitz,et al.  Facilitating multiparty dialog with gaze, gesture, and speech , 2010, ICMI-MLMI '10.

[13]  Jean-Marc Odobez,et al.  Recognizing Visual Focus of Attention From Head Pose in Natural Meetings , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Julia Hirschberg,et al.  Turn-taking cues in task-oriented dialogue , 2011, Comput. Speech Lang..

[15]  A. Ichikawa,et al.  An Analysis of Turn-Taking and Backchannels Based on Prosodic and Syntactic Features in Japanese Map Task Dialogs , 1998, Language and speech.

[16]  Gabriel Skantze,et al.  Opportunities and Obligations to Take Turns in Collaborative Multi-Party Human-Robot Interaction , 2015, SIGDIAL Conference.

[17]  Gabriel Skantze,et al.  The furhat Back-Projected humanoid Head-Lip Reading, gaze and Multi-Party Interaction , 2013, Int. J. Humanoid Robotics.

[18]  Gabriel Skantze,et al.  Furhat at Robotville : A Robot Head Harvesting the Thoughts of the Public through Multi-party Dialogue , 2012, IVA 2012.

[19]  Junji Yamato,et al.  Analysis of Respiration for Prediction of "Who Will Be Next Speaker and When?" in Multi-Party Meetings , 2014, ICMI.

[20]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[21]  B. Velichkovsky Communicating attention: Gaze position transfer in cooperative problem solving , 1995 .

[22]  Tanja Schultz,et al.  Identifying the addressee in human-human-robot interactions based on head pose and speech , 2004, ICMI '04.

[23]  Gabriel Skantze,et al.  IrisTK: a statechart-based toolkit for multi-party face-to-face interaction , 2012, ICMI '12.

[24]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[25]  Tatsuya Kawahara,et al.  Prediction of Turn-Taking by Combining Prosodic and Eye-Gaze Information in Poster Conversations , 2012, INTERSPEECH.