Speech-Driven Automatic Facial Expression Synthesis

This paper focuses on the problem of automatically generating speech synchronous facial expressions for 3D talking heads. The proposed system is speaker and language independent. We parameterize speech data with prosody related features and spectral features together with their first and second order derivatives. Then, we classify the seven emotions in the dataset with two different classifiers: Gaussian mixture models (GMMs) and Hidden Markov Models (HMMs). Probability density function of the spectral feature space is modeled with a GMM for each emotion. Temporal patterns of the emotion dependent prosody contours are modeled with an HMM based classifier. We use the Berlin Emotional Speech dataset (EMO-DB) [ 1 ] during the experiments. GMM classifier has the best overall recognition rate 82.85% when cepstral features with delta and acceleration coefficients are used. HMM based classifier has lower recognition rates than the GMM based classifier. However, fusion of the two classifiers has 83.80% recognition rate on the average. Experimental results on automatic facial expression synthesis are encouraging.

[1]  A. Tanju Erdem,et al.  A new method for generating 3-D face models for personalized user interaction , 2005, 2005 13th European Signal Processing Conference.

[2]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[3]  A. Murat Tekalp,et al.  Multimodal speaker identification using an adaptive classifier cascade based on modality reliability , 2005, IEEE Transactions on Multimedia.

[4]  Kyu-Sik Park,et al.  Speech Emotion Pattern Recognition Agent in Mobile Communication Environment Using Fuzzy-SVM , 2007, ICFIE.

[5]  Andreas Wendemuth,et al.  TUNING HIDDEN MARKOV MODEL FOR SPEECH EMOTION RECOGNITION , 2007 .

[6]  Fakhri Karray,et al.  Speech Emotion Recognition using Gaussian Mixture Vector Autoregressive Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[8]  Justine Cassell,et al.  Requirements for an Architecture for Embodied Conversational Characters , 1999, Computer Animation and Simulation.

[9]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[10]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[11]  Mark Steedman,et al.  Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.