Extracting Emotion from Speech: Towards Emotional Speech-Driven Facial Animations

Facial expressions and characteristics of speech are exploited intuitively by humans to infer the emotional status of their partners in communication. This paper investigates ways to extract emotion from spontaneous speech, aiming at transferring emotions to appropriate facial expressions of the speaker's virtual representatives. Hence, this paper presents one step towards an emotional speech-driven facial animation system, promises to be the first true non-human animation assistant. Different classifier-algorithms (support vector machines, neural networks, and decision trees) were compared in extracting emotion from speech features. Results show that these machine-learning algorithms outperform human subjects extracting emotion fromspeech alone if there is no access to additional cues onto the emotional state.

[1]  Yasunari Yoshitomi,et al.  Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face , 2000, Proceedings 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000 (Cat. No.00TH8499).

[2]  Cynthia Breazeal,et al.  Recognition of Affective Communicative Intent in Robot-Directed Speech , 2002, Auton. Robots.

[3]  Tom Brøndsted,et al.  Affective multimodal interaction with a 3D agent , 1999 .

[4]  Elmar Nöth,et al.  Recognition of emotion in a realistic dialogue scenario , 2000, INTERSPEECH.

[5]  Klaus R. Scherer,et al.  Vocal communication of emotion , 2000 .

[6]  Harry Shum,et al.  Speech-driven cartoon animation with emotions , 2001, MULTIMEDIA '01.

[7]  Shrikanth S. Narayanan,et al.  Combining acoustic and language information for emotion recognition , 2002, INTERSPEECH.

[8]  Sandra Clara Gadanho,et al.  Emotion-triggered Learning in Autonomous Robot Control , 2001, Cybern. Syst..

[9]  N. Amir,et al.  Towards an automatic classification of emotions in speech , 1998, ICSLP.

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[11]  Rosalind W. Picard Affective Computing , 1997 .

[12]  Catherine I. Watson,et al.  Some acoustic characteristics of emotion , 1998, ICSLP.

[13]  Irfan Essa,et al.  Prosody Analysis for Speaker Affect Determination , 1997 .

[14]  Malcolm Slaney,et al.  Baby Ears: a recognition system for affective vocalizations , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Alex Pentland,et al.  Automatic spoken affect classification and analysis , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[16]  Steve Young,et al.  The HTK book , 1995 .

[17]  Mark Huckvale,et al.  The SPAR speech filing system , 1987, ECST.

[18]  Ryohei Nakatsu,et al.  Emotion recognition and its application to computer agents with spontaneous interactive capabilities , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[19]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .