Toward a Guide Agent who Actively Intervene Inter-user Conversation - Timing Definition and Trial of Automatic Detection using Low-level Nonverbal Features

As the advance of embodied conversational agent (ECA) technologies, there are more and more real-world deployed applications of ECAâ??s. The guides in museums or exhibitions are typical examples. However, in these situations, the agent systems usually need to engage groups of visitors rather than individual ones. In such a multi-user situation, which is much more complex than single user one, specialized additional features are required. One of them is the ability for the agent to smoothly intervene user-user conversation. In order to realize this, at first, a Wizard-of-Oz (WOZ) experiment was conducted for collecting human interaction data. By analyzing the collected data corpus, four kinds of timings that potentially allow the agent to do intervention were found. The collected corpus was then annotated with these defined timings by recruited evaluators with a dedicated and intuitive tool. Finally, as the trial of the possibility of automatic detection on these timings, the use of non-verbal low level features were able to achieve a moderate accuracy.

[1]  Hung-Hsuan Huang,et al.  Addressee identification for human-human-agent multiparty conversations in different proxemics , 2012, Gaze-In '12.

[2]  Stefan Kopp,et al.  A Conversational Agent as Museum Guide - Design and Evaluation of a Real-World Application , 2005, IVA.

[3]  Anton Leuski,et al.  Ada and Grace: Direct Interaction with Museum Visitors , 2012, IVA.

[4]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[5]  Herbert H. Clark,et al.  Contributing to Discourse , 1989, Cogn. Sci..

[6]  Hagen Soltau,et al.  Advances in automatic meeting record creation and access , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  Thomas Hain,et al.  Recognition and understanding of meetings the AMI and AMIDA projects , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[9]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[10]  Mikio Nakano,et al.  Handling rich turn-taking in spoken dialogue systems , 1999, EUROSPEECH.

[11]  M. Argyle,et al.  Gaze and Mutual Gaze , 1994, British Journal of Psychiatry.

[12]  Kristinn R. Thórisson,et al.  Teaching Computers to Conduct Spoken Interviews: Breaking the Realtime Barrier with Learning , 2009, IVA.

[13]  David R. Traum,et al.  Issues in Multiparty Dialogues , 2003, Workshop on Agent Communication Languages.

[14]  Naoki Mukawa,et al.  Video cut editing rule based on participants' gaze in multiparty conversation , 2003, MULTIMEDIA '03.

[15]  Hung-Hsuan Huang,et al.  Making virtual conversational agent aware of the addressee of users' utterances in multi-user conversation using nonverbal information , 2011, ICMI '11.

[16]  Subramanian Ramanathan,et al.  Putting the pieces together: multimodal analysis of social attention in meetings , 2010, ACM Multimedia.

[17]  A. Kendon Some functions of gaze-direction in social interaction. , 1967, Acta psychologica.

[18]  Alex Waibel,et al.  MEETING BROWSER: TRACKING AND SUMMARIZING MEETINGS , 2007 .

[19]  Eric Horvitz,et al.  Facilitating multiparty dialog with gaze, gesture, and speech , 2010, ICMI-MLMI '10.