HMM-based synthesis of emotional facial expressions during speech in synthetic talking heads

One of the research goals in the human-computer interaction community is to build believable Embodied Conversational Agents, that is, agents able to communicate complex information with human-like expressiveness and naturalness. Since emotions play a crucial role in human communication and most of them are expressed through the face, having more believable ECAs implies to give them the ability of displaying emotional facial expressions.This paper presents a system based on Hidden Markov Models (HMMs) for the synthesis of emotional facial expressions during speech. The HMMs were trained on a set of emotion examples in which a professional actor uttered Italian non-sense words, acting various emotional facial expressions with different intensities.The evaluation of the experimental results, performed comparing the "synthetic examples" (generated by the system) with a reference "natural example" (one of the actor's examples) in three different ways, shows that HMMs for emotional facial expressions synthesis have some limitations but are suitable to make a synthetic Talking Head more expressive and realistic.

[1]  Fabio Pianesi,et al.  Evaluation of Synthetic Faces: Human Recognition of Emotional Facial Displays , 2004, ADS.

[2]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[3]  Thomas S. Huang,et al.  Emotion Recognition from Facial Expressions using Multilevel HMM , 2000 .

[4]  Koray Balci Xface: Open Source Toolkit for Creating 3D Faces of an Embodied Conversational Agent , 2005, Smart Graphics.

[5]  Fabio Pianesi,et al.  Preliminary Cross-Cultural Evaluation of Expressiveness in Synthetic Faces , 2004, ADS.

[6]  Byron Reeves,et al.  The effects of animated characters on anxiety, task performance, and evaluations of user interfaces , 2000, CHI.

[7]  Fabio Lavagetto,et al.  MPEG-4: Audio/video and synthetic graphics/audio for mixed media , 1997, Signal Process. Image Commun..

[8]  Justine Cassell,et al.  Human conversation as a system framework: designing embodied conversational agents , 2001 .

[9]  Piero Cosi,et al.  Italian consonantal visemes: relationships between spatial/ temporal articulatory characteristics and coproduced acoustic signal , 1997, AVSP.

[10]  Abeer Alwan,et al.  Text to Speech Synthesis: New Paradigms and Advances , 2004 .

[11]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[12]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[13]  Giancarlo Ferrigno,et al.  Elite: A Digital Dedicated Hardware System for Movement Analysis Via Real-Time TV Signal Processing , 1985, IEEE Transactions on Biomedical Engineering.

[14]  Mark Steedman,et al.  Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[15]  Dirk Heylen,et al.  Generation of Facial Expressions from Emotion Using a Fuzzy Rule Based System , 2001, Australian Joint Conference on Artificial Intelligence.

[16]  Mervyn A. Jack,et al.  Experimental assessment of the effectiveness of synthetic personae for multi-modal e-retail applications , 2000, AGENTS '00.

[17]  Larry L. Peterson,et al.  Reasoning about naming systems , 1993, TOPL.

[18]  F. Pianesi,et al.  An Italian Database of Emotional Speech and Facial Expressions , 2006 .

[19]  Mervyn A. Jack,et al.  Evaluating humanoid synthetic agents in e-retail applications , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[20]  P. Ekman An argument for basic emotions , 1992 .

[21]  Catherine Pelachaud,et al.  From Greta's mind to her face: modelling the dynamics of affective states in a conversational embodied agent , 2003, Int. J. Hum. Comput. Stud..

[22]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.