Detecting Address Estimation Errors from Users' Reactions in Multi-user Agent Conversation

Nowadays, embodied conversational agents are gradually getting deployed in real-world applications like the guides in museums or exhibitions. In these applications, it is necessary for the agent to identify the addressee of each user utterance to deliberate appropriate responses in interacting with visitor groups. However, as long as the addressee identification mechanism is not completely correct, the agent makes error in its responses. Once there is an error, the agent’s hypothesis collapses and the following decision-making path may go to a totally different direction. We are working on developing the mechanism to detect the error from the users’ reactions and the mechanism to recover the error. This paper presents the first step, a method to detect laughing, surprises, and confused facial expressions after the agent’s wrong responses. This method is machine learning base with the data (user reactions) collected in a WOZ (Wizard of Oz) experiment and reached an accuracy over 90%.

[1]  Hung-Hsuan Huang,et al.  Addressee identification for human-human-agent multiparty conversations in different proxemics , 2012, Gaze-In '12.

[2]  Maarten Sierhuis,et al.  Are You Talking to Me? Dialogue Systems Supporting Mixed Teams of Humans and Robots , 2006, AAAI Fall Symposium: Aurally Informed Performance.

[3]  A. Kendon Some functions of gaze-direction in social interaction. , 1967, Acta psychologica.

[4]  Anton Nijholt,et al.  Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes , 2001, CHI.

[5]  M. Argyle,et al.  Gaze and Mutual Gaze , 1994, British Journal of Psychiatry.

[6]  Jacques M. B. Terken,et al.  Multimodalcues for addressee-hood in triadic communication with a human information retrieval agent , 2007, ICMI '07.

[7]  Hung-Hsuan Huang,et al.  Implementation and evaluation of a multimodal addressee identification mechanism for multiparty conversation systems , 2013, ICMI '13.

[8]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[9]  Sharon L. Oviatt,et al.  Human perception of intended addressee during computer-assisted meetings , 2006, ICMI '06.

[10]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[11]  Tanja Schultz,et al.  Identifying the addressee in human-human-robot interactions based on head pose and speech , 2004, ICMI '04.

[12]  Hung-Hsuan Huang,et al.  Making virtual conversational agent aware of the addressee of users' utterances in multi-user conversation using nonverbal information , 2011, ICMI '11.

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[14]  Benjamin Lok,et al.  Audio Analysis of Human/Virtual-Human Interaction , 2008, IVA.

[15]  Naoki Mukawa,et al.  Video cut editing rule based on participants' gaze in multiparty conversation , 2003, MULTIMEDIA '03.