Making virtual conversational agent aware of the addressee of users' utterances in multi-user conversation using nonverbal information

In multi-user human-agent interaction, the agent should respond to the user when an utterance is addressed to it. To do this, the agent needs to be able to judge whether the utterance is addressed to the agent or to another user. This study proposes a method for estimating the addressee based on the prosodic features of the user's speech and head direction (approximate gaze direction). First, a WOZ experiment is conducted to collect a corpus of human-humanagent triadic conversations. Then, analysis is performed to find out whether the prosodic features as well as head direction information are correlated with the addressee-hood. Based on this analysis, a SVM classifier is trained to estimate the addressee by integrating both the prosodic features and head movement information. Finally, a prototype agent equipped with this real-time addressee estimation mechanism is developed and evaluated.

[1]  Thomas Hain,et al.  Recognition and understanding of meetings the AMI and AMIDA projects , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[2]  Staffan Larsson,et al.  Information state and dialogue management in the TRINDI dialogue move engine toolkit , 2000, Natural Language Engineering.

[3]  Naoki Mukawa,et al.  Video cut editing rule based on participants' gaze in multiparty conversation , 2003, MULTIMEDIA '03.

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[6]  Rieks op den Akker,et al.  Are You Being Addressed? - Real-Time Addressee Detection to Support Remote Participants in Hybrid Meetings , 2009, SIGDIAL Conference.

[7]  Stanley Peters,et al.  Who is “You”? Combining Linguistic and Gaze Features to Resolve Second-Person References in Dialogue , 2009, EACL.

[8]  Hagen Soltau,et al.  Advances in automatic meeting record creation and access , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  Rieks op den Akker,et al.  A comparison of addressee detection methods for multiparty conversations , 2009 .

[10]  Sharon L. Oviatt,et al.  Human perception of intended addressee during computer-assisted meetings , 2006, ICMI '06.

[11]  Hung-Hsuan Huang,et al.  The design of a generic framework for integrating ECA components , 2008, AAMAS.

[12]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[13]  Jacques M. B. Terken,et al.  Multimodalcues for addressee-hood in triadic communication with a human information retrieval agent , 2007, ICMI '07.

[14]  Alex Waibel,et al.  MEETING BROWSER: TRACKING AND SUMMARIZING MEETINGS , 2007 .

[15]  Benjamin Lok,et al.  Audio Analysis of Human/Virtual-Human Interaction , 2008, IVA.

[16]  Marcus J. Huber JAM: a BDI-theoretic mobile agent architecture , 1999, AGENTS '99.

[17]  Tatsuya Kawahara,et al.  Recent Development of Open-Source Speech Recognition Engine Julius , 2009 .

[18]  M. Argyle,et al.  Gaze and Mutual Gaze , 1994, British Journal of Psychiatry.

[19]  Anton Nijholt,et al.  Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes , 2001, CHI.

[20]  Tanja Schultz,et al.  Identifying the addressee in human-human-robot interactions based on head pose and speech , 2004, ICMI '04.

[21]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[22]  Maarten Sierhuis,et al.  Are You Talking to Me? Dialogue Systems Supporting Mixed Teams of Humans and Robots , 2006, AAAI Fall Symposium: Aurally Informed Performance.

[23]  A. Kendon Some functions of gaze-direction in social interaction. , 1967, Acta psychologica.

[24]  Michael Kipp Spatiotemporal Coding in ANVIL , 2008, LREC.