Towards Believable Behavior Generation for Embodied Conversational Agents

This paper reports on the generation of coordinated multimodal output for the NICE (Natural Interactive Communication for Edutainment) system [1]. In its first prototype, the system allows for fun and experientially rich interaction between primarily 10 to 18 years old human users and 3D-embodied fairy tale author H.C. Andersen in his study. User input consists of domain-oriented spoken conversation combined with 2D input gesture, entered via a mouse-compatible device. The animated character can move about and interact with his environment as well as communicate with the user through spoken conversation and non-verbal gesture, body posture, facial expression and gaze. The described approach aims to make the virtual agent’s appearance, voice, actions, and communicative behavior convey the impression of a character with human-like behavior, emotions, relevant domain knowledge, and a distinct personality. We propose an approach to multimodal output generation, which exploits a richly parameterized semantic instruction from the conversation manager and splits the instruction into synchronized text instructions to the text-to-speech synthesizer, and behavioral instructions to the animated character. Based on the implemented version of this approach, we are in the process of creating a behavior sub-system that combines the described multimodal output instructions with parameters representing the current emotional state of the character, producing animations that express emotional state through speech and non-verbal behavior.

[1]  D. Massaro,et al.  Development and Evaluation of a Computer-Animated Tutor for Vocabulary and Language Learning in Children with Autism , 2003, Journal of autism and developmental disorders.

[2]  Aaron Bryan Loyall,et al.  Believable agents: building interactive personalities , 1997 .

[3]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[4]  Pattie Maes,et al.  Agents with Faces: The Effects of Personification of Agents , 1996 .

[5]  Niels Ole Bernsen,et al.  Designing interactive speech systems - from first ideas to user testing , 1998 .

[6]  M L Abercrombie,et al.  Non-verbal communication. , 1972, Proceedings of the Royal Society of Medicine.

[7]  P. Ekman,et al.  Nonverbal leakage and clues to deception. , 1969, Psychiatry.

[8]  Catherine Pelachaud,et al.  Embodied contextual agent in information delivering application , 2002, AAMAS '02.

[9]  Clifford Nass,et al.  The media equation - how people treat computers, television, and new media like real people and places , 1996 .

[10]  C. Nass,et al.  Truth is beauty: researching embodied conversational agents , 2001 .

[11]  M. Argyle Bodily communication, 2nd ed. , 1988 .

[12]  Dominic W. Massaro,et al.  Development and Evaluation of a Computer-Animated Tutor for Language and Vocabulary Learning , 2003 .

[13]  Niels Ole Bernsen,et al.  First prototype of conversational H.C. Andersen , 2004, AVI.

[14]  D W Massaro,et al.  Speech perception in perceivers with hearing loss: synergy of multiple modalities. , 1999, Journal of speech, language, and hearing research : JSLHR.

[15]  K. Chang,et al.  Embodiment in conversational interfaces: Rea , 1999, CHI '99.

[16]  J. Cassell,et al.  Embodied conversational agents , 2000 .

[17]  Shelley E. Taylor,et al.  Social cognition, 2nd ed. , 1991 .

[18]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[19]  Niels Ole Bernsen,et al.  Designing Interactive Speech Systems , 1998, Springer London.