Are you talking to me?: Improving the Robustness of Dialogue Systems in a Multi Party HRI Scenario by Incorporating Gaze Direction and Lip Movement of Attendees

In this paper, we present our humanoid robot "Meka", participating in a multi party human robot dialogue scenario. Active arbitration of the robot's attention based on multi-modal stimuli is utilised to observe persons which are outside of the robots field of view. We investigate the impact of this attention management and addressee recognition on the robot's capability to distinguish utterances directed at it from communication between humans. Based on the results of a user study, we show that mutual gaze at the end of an utterance, as a means of yielding a turn, is a substantial cue for addressee recognition. Verification of a speaker through the detection of lip movements can be used to further increase precision. Furthermore, we show that even a rather simplistic fusion of gaze and lip movement cues allows a considerable enhancement in addressee estimation, and can be altered to adapt to the requirements of a particular scenario.

[1]  David Lee,et al.  Exploratory studies on social spaces between humans and a mechanical-looking robot , 2006, Connect. Sci..

[2]  Yukie Nagai,et al.  Yet another gaze detector: An embodied calibration free system for the iCub robot , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[3]  Qianli Xu,et al.  Attention-based addressee selection for service and social robots to interact with multiple persons , 2012, WASA '12.

[4]  Vanessa Evers,et al.  The influence of social presence on acceptance of a companion robot by older people , 2008 .

[5]  Gabriel Skantze,et al.  Exploring Turn-taking Cues in Multi-party Human-robot Discussions about Objects , 2015, ICMI.

[6]  Petra Wagner,et al.  How to Address Smart Homes with a Social Robot? A Multi-modal Corpus of User Interactions with an Intelligent Environment , 2016, LREC.

[7]  Sebastian Wrede,et al.  A middleware for collaborative research in experimental robotics , 2011, 2011 IEEE/SICE International Symposium on System Integration (SII).

[8]  Illah R. Nourbakhsh,et al.  A survey of socially interactive robots , 2003, Robotics Auton. Syst..

[9]  Gabriel Skantze,et al.  Comparison of Human-Human and Human-Robot Turn-Taking Behaviour in Multiparty Situated Interaction , 2014, UM3I '14.

[10]  Jean-Marc Odobez,et al.  Given that, should i respond? Contextual addressee estimation in multi-party human-robot interactions , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[11]  Brian Scassellati,et al.  A Context-Dependent Attention System for a Social Robot , 1999, IJCAI.

[12]  Kerstin Dautenhahn,et al.  Socially intelligent robots: dimensions of human–robot interaction , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[13]  Eric Horvitz,et al.  Facilitating multiparty dialog with gaze, gesture, and speech , 2010, ICMI-MLMI '10.

[14]  David Schlangen,et al.  The InproTK 2012 release , 2012, SDCTD@NAACL-HLT.

[15]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Sven Wachsmuth,et al.  How Can I Help? , 2011, Int. J. Soc. Robotics.

[17]  Gabriel Skantze,et al.  Opportunities and Obligations to Take Turns in Collaborative Multi-Party Human-Robot Interaction , 2015, SIGDIAL Conference.

[18]  Marc Schröder,et al.  The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[19]  Patrick Holthaus,et al.  Approaching human-like spatial awareness in social robotics: an investigation of spatial interaction strategies with a receptionist robot , 2014 .

[20]  David Schlangen,et al.  Towards Closed Feedback Loops in HRI: Integrating InproTK and PaMini , 2014, MMRWHRI '14.

[21]  Cynthia Breazeal,et al.  Toward sociable robots , 2003, Robotics Auton. Syst..

[22]  Britta Wrede,et al.  Pamini: A framework for assembling mixed-initiative human-robot interaction from generic interaction patterns , 2010, SIGDIAL Conference.

[23]  Illah R. Nourbakhsh,et al.  The role of expressiveness and attention in human-robot interaction , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[24]  Matthew M. Williamson,et al.  Series elastic actuators , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[25]  Eric Horvitz,et al.  Multiparty Turn Taking in Situated Dialog: Study, Lessons, and Directions , 2011, SIGDIAL Conference.

[26]  Sebastian Lang,et al.  Providing the basis for human-robot-interaction: a multi-modal attention system for a mobile robot , 2003, ICMI '03.

[27]  Ram Dass,et al.  How can I help , 1985 .

[28]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[29]  Manfred K. Warmuth,et al.  THE CMU SPHINX-4 SPEECH RECOGNITION SYSTEM , 2001 .

[30]  Takayuki Kanda,et al.  Footing in human-robot conversations: How robots might shape participant roles using gaze cues , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[31]  Alexandre Bernardino,et al.  Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub , 2008, 2008 IEEE International Conference on Robotics and Automation.

[32]  David R. Traum,et al.  Issues in Multiparty Dialogues , 2003, Workshop on Agent Communication Languages.

[33]  Boris E. R. de Ruyter,et al.  Assessing the effects of building social intelligence in a robotic interface for the home , 2005, Interact. Comput..

[34]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[35]  P. Bossuyt,et al.  The diagnostic odds ratio: a single indicator of test performance. , 2003, Journal of clinical epidemiology.