The Effect of Embodied Conversational Agents' Speech Quality on Users' Attention and Emotion

This study investigates the influence of the speech quality of Embodied Conversational Agents (ECAs) on users’ perception, behavior and emotions. Twenty-four subjects interacted in a Wizard of Oz (WOZ) setup with two ECAs in two scenarios of a virtual theater partner application. In both scenarios, each ECA had three different speech qualities: natural, high-quality synthetic and low-quality synthetic. Eye gaze data show that subjects’ visual attention was not influenced by ECA’s speech quality, but by their look. On the other hand, subjects’ self-report of emotions and verbal descriptions of their perceptions were influenced by ECAs’ speech quality. Finally, Galvanic Skin Response data were neither influenced by ECAs’ look, nor by their speech quality. These results stress the importance of the correct matching of the auditory and visual modalities of ECAs and give methodological insights for the assessment of user’s perception, behavior and emotions when interacting with virtual characters.

[1]  J. J. Jacobs,et al.  When a Car Makes You Smile: Development and Application of an Instrument to Measure Product Emotions , 2000 .

[2]  R. Lockhart,et al.  Interrelations between amplitude, latency, rise time, and the Edelberg recovery measure of the galvanic skin response. , 1972, Psychophysiology.

[3]  Catherine Pelachaud,et al.  From brows to trust: evaluating embodied conversational agents , 2004 .

[4]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[5]  Kristina Höök,et al.  Sense and sensibility: evaluation and interactive art , 2003, CHI '03.

[6]  Nick Campbell Specifying Affect and Emotion for Expressive Speech Synthesis , 2004, CICLing.

[7]  H. Noot,et al.  Evaluating ECAs - What and how? , 2006 .

[8]  J. Cassell,et al.  Embodied conversational agents , 2000 .

[9]  Marc Mersiol,et al.  Talking heads: which matching between faces and synthetic voices? , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[10]  Abeer Alwan,et al.  Text to Speech Synthesis: New Paradigms and Advances , 2004 .

[11]  Anton Nijholt Towards multi-modal emotion display in embodied agents , 2001 .

[12]  James R. Lewis,et al.  Expanding the MOS: Development and Psychometric Evaluation of the MOS-R and MOS-X , 2003, Int. J. Speech Technol..

[13]  M. Jones,et al.  Attending to auditory events: The role of temporal organization. , 1993 .

[14]  S. McAdams,et al.  Auditory Cognition. (Book Reviews: Thinking in Sound. The Cognitive Psychology of Human Audition.) , 1993 .