Towards an Engagement-Aware Attentive Artificial Listener for Multi-Party Interactions

Listening to one another is essential to human-human interaction. In fact, we humans spend a substantial part of our day listening to other people, in private as well as in work settings. Attentive listening serves the function to gather information for oneself, but at the same time, it also signals to the speaker that he/she is being heard. To deduce whether our interlocutor is listening to us, we are relying on reading his/her nonverbal cues, very much like how we also use non-verbal cues to signal our attention. Such signaling becomes more complex when we move from dyadic to multi-party interactions. Understanding how humans use nonverbal cues in a multi-party listening context not only increases our understanding of human-human communication but also aids the development of successful human-robot interactions. This paper aims to bring together previous analyses of listener behavior analyses in human-human multi-party interaction and provide novel insights into gaze patterns between the listeners in particular. We are investigating whether the gaze patterns and feedback behavior, as observed in the human-human dialogue, are also beneficial for the perception of a robot in multi-party human-robot interaction. To answer this question, we are implementing an attentive listening system that generates multi-modal listening behavior based on our human-human analysis. We are comparing our system to a baseline system that does not differentiate between different listener types in its behavior generation. We are evaluating it in terms of the participant’s perception of the robot, his behavior as well as the perception of third-party observers.

[1]  Jean-Marc Odobez,et al.  Using self-context for multimodal detection of head nods in face-to-face interactions , 2012, ICMI '12.

[2]  Mirko Gelsomini,et al.  Telling Stories to Robots: The Effect of Backchanneling on a Child's Storytelling * , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[3]  Tatsuya Kawahara,et al.  Prediction and Generation of Backchannel Form for Attentive Listening Systems , 2016, INTERSPEECH.

[4]  Christina Gloeckner Interaction Ritual Essays In Face To Face Behavior , 2016 .

[5]  Louis-Philippe Morency,et al.  Learning Backchannel Prediction Model from Parasocial Consensus Sampling: A Subjective Evaluation , 2010, IVA.

[6]  I. Poggi Mind, hands, face and body. A goal and belief view of multimodal communication , 2007 .

[7]  Roman Bednarik,et al.  Gaze and conversational engagement in multiparty video conversation: an annotation scheme and classification of high and low levels of engagement , 2012, Gaze-In '12.

[8]  Gérard Bailly,et al.  Quantitative Analysis of Backchannels Uttered by an Interviewer During Neuropsychological Tests , 2016, INTERSPEECH.

[9]  V. Yngve On getting a word in edgewise , 1970 .

[10]  Candace L. Sidner,et al.  Explorations in engagement for humans and robots , 2005, Artif. Intell..

[11]  Björn W. Schuller,et al.  Building autonomous sensitive artificial listeners (Extended abstract) , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[12]  Elisabetta Bevacqua,et al.  Multimodal Backchannels for Embodied Conversational Agents , 2010, IVA.

[13]  J. Bavelas,et al.  The Listener as Addressee in Face-to-Face Dialogue , 2011 .

[14]  Gabriel Skantze,et al.  Exploring the effects of gaze and pauses in situated human-robot interaction , 2013, SIGDIAL Conference.

[15]  Nigel Ward,et al.  A prosodic feature that invites back-channels in Egyptian Arabic , 2007 .

[16]  Brian Scassellati,et al.  Are you looking at me? Perception of robot attention is mediated by gaze type and group size , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[17]  Dirk Heylen,et al.  The Sensitive Artificial Listner: an induction technique for generating emotionally coloured conversation , 2008 .

[18]  Frank J. Bernieri,et al.  Toward a histology of social behavior: Judgmental accuracy from thin slices of the behavioral stream , 2000 .

[19]  K. Chang,et al.  Embodiment in conversational interfaces: Rea , 1999, CHI '99.

[20]  Norman I. Badler,et al.  A Review of Eye Gaze in Virtual Agents, Social Robotics and HCI: Behaviour Generation, User Interaction and Perception , 2015, Comput. Graph. Forum.

[21]  Starr Roxanne Hiltz,et al.  A field study of use of synchronous computer-mediated communication in asynchronous learning networks , 2002 .

[22]  Dirk Heylen,et al.  A rule-based backchannel prediction model using pitch and pause information , 2010, INTERSPEECH.

[23]  Stacy Marsella,et al.  Virtual Rapport , 2006, IVA.

[24]  Sean Andrist,et al.  Conversational Gaze Aversion for Humanlike Robots , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[25]  Philippe Blache,et al.  Backchannels revisited from a multimodal perspective , 2007, AVSP.

[26]  M. Argyle,et al.  Gaze and Mutual Gaze , 1994, British Journal of Psychiatry.

[27]  Stefan Kopp,et al.  Gesture and speech in interaction: An overview , 2014, Speech Commun..

[28]  Gabriel Skantze,et al.  Turn-taking, feedback and joint attention in situated human-robot interaction , 2014, Speech Commun..

[29]  Jean-Marc Odobez,et al.  Deciphering the Silent Participant: On the Use of Audio-Visual Cues for the Classification of Listener Categories in Group Discussions , 2015, ICMI.

[30]  Stacy Marsella,et al.  Natural Behavior of a Listening Agent , 2005, IVA.

[31]  Louis-Philippe Morency,et al.  The effect of head-nod recognition in human-robot conversation , 2006, HRI '06.

[32]  Maja J. Matarić,et al.  Modeling Moderation for Multi-Party Socially Assistive Robotics , 2016, RO-MAN 2016.

[33]  Mel Slater,et al.  An Eye Gaze Model for Dyadic Interaction in an Immersive Virtual Environment: Practice and Experience , 2004, Comput. Graph. Forum.

[34]  J. Allwood,et al.  A study of gestural feedback expressions , 2006 .

[35]  Anton Nijholt,et al.  Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes , 2001, CHI.

[36]  Justus H. Piater,et al.  The Effects of Social Gaze in Human-Robot Collaborative Assembly , 2015, ICSR.

[37]  Karon E. MacLean,et al.  Meet Me where I’m Gazing: How Shared Attention Gaze Affects Human-Robot Handover Timing , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[38]  Nigel G. Ward,et al.  Analysis and Prediction of Morphological Patterns of Backchannels for Attentive Listening Agents , 2015 .

[39]  Giampiero Salvi,et al.  A gaze-based method for relating group involvement to individual engagement in multimodal multiparty dialogue , 2013, ICMI '13.

[40]  C. Goodwin Between and within: Alternative sequential treatments of continuers and assessments , 1986 .

[41]  Joakim Gustafson,et al.  Semi-supervised methods for exploring the acoustics of simple productive feedback , 2013, Speech Commun..

[42]  Joakim Nivre,et al.  On the Semantics and Pragmatics of Linguistic Feedback , 1992, J. Semant..

[43]  Samy Bengio,et al.  Detecting group interest-level in meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[44]  Jennifer L. Gregg,et al.  The Networked Minds Measure of Social Presence : Pilot Test of the Factor Structure and Concurrent Validity , 2001 .

[45]  Stefan Kopp,et al.  Modeling Embodied Feedback with Virtual Humans , 2006, ZiF Workshop.

[46]  C. Tu,et al.  The Relationship of Social Presence and Interaction in Online Classes , 2002 .

[47]  Louis-Philippe Morency,et al.  A probabilistic multimodal approach for predicting listener backchannels , 2009, Autonomous Agents and Multi-Agent Systems.

[48]  Takayuki Kanda,et al.  Nonverbal leakage in robots: Communication of intentions through seemingly unintentional behavior , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[49]  Joakim Gustafson,et al.  Providing Computer Game Characters with Conversational Abilities , 2005, IVA.

[50]  Mattias Heldner,et al.  Very Short Utterances and Timing in Turn-Taking , 2011, INTERSPEECH.

[51]  Joakim Gustafson,et al.  Responsive Joint Attention in Human-Robot Interaction , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[52]  Catherine Pelachaud,et al.  Engagement in Human-Agent Interaction: An Overview , 2020, Frontiers in Robotics and AI.

[53]  Jean-Marc Odobez,et al.  Towards building an attentive artificial listener: on the perception of attentiveness in audio-visual feedback tokens , 2016, ICMI.

[54]  Dirk Heylen,et al.  A Multimodal Analysis of Vocal and Visual Backchannels in Spontaneous Dialogs , 2011, INTERSPEECH.

[55]  Takayuki Kanda,et al.  Footing in human-robot conversations: How robots might shape participant roles using gaze cues , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[56]  Dirk Heylen,et al.  Speaker-adaptive multimodal prediction model for listener responses , 2013, ICMI '13.

[57]  Stacy Marsella,et al.  Multi-party, multi-role comprehensive listening behavior , 2013, Autonomous Agents and Multi-Agent Systems.

[58]  Ning Wang,et al.  Don't just stare at me! , 2010, CHI.

[59]  Nick Campbell,et al.  On the Use of Multimodal Cues for the Prediction of Degrees of Involvement in Spontaneous Conversation , 2011, INTERSPEECH.

[60]  J. Cassell,et al.  Turn Taking versus Discourse Structure , 1999 .

[61]  D. Stern,et al.  “Conversational” coupling of gaze behavior in prelinguistic human development , 1973, Journal of psycholinguistic research.

[62]  Sean Andrist,et al.  Looking Coordinated: Bidirectional Gaze Mechanisms for Collaborative Interaction with Virtual Characters , 2017, CHI.

[63]  Jennifer C. Richardson,et al.  EXAMINING SOCIAL PRESENCE IN ONLINE COURSES IN RELATION TO STUDENTS' PERCEIVED LEARNING AND SATISFACTION , 2003, Online Learning.

[64]  Louis-Philippe Morency,et al.  Virtual Rapport 2.0 , 2011, IVA.

[65]  Hirotake Yamazoe,et al.  Gaze-communicative behavior of stuffed-toy robot with joint attention and eye contact based on ambient gaze-tracking , 2007, ICMI '07.

[66]  Bilge Mutlu,et al.  A Storytelling Robot: Modeling and Evaluation of Human-like Gaze Behavior , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[67]  H. Rosenfeld,et al.  The Nonverbal Context of Verbal Listener Responses , 1981 .

[68]  Tetsunori Kobayashi,et al.  Four-participant group conversation: A facilitation robot controlling engagement density as the fourth participant , 2015, Comput. Speech Lang..