Automatic emotion recognition for facial expression animation from speech

We present a framework for automatically generating the facial expression animation of 3D talking heads using only the speech information. Our system is trained on the Berlin emotional speech dataset that is in German and includes seven emotions. We first parameterize the speech signal with prosody related features and spectral features. Then, we investigate two different classifier architectures for the emotion recognition: Gaussian mixture model (GMM) and hidden Markov model (HMM) based classifiers. In the experimental studies, we achieve an average emotion recognition rate of 83.42% using 5-fold stratified cross validation (SCV) method with a GMM classifier based on Mel frequency cepstral coefficients (MFCC) and dynamic MFCC features. Moreover, decision fusion of two GMM classifiers based on MFCC and line spectral frequency (LSF) features yields an average recognition rate of 85.30%. Also, a second-stage decision fusion of this result with a prosody-based HMM classifier further advances the average recognition rate up to 86.45%. Experimental results on automatic emotion recognition to drive facial expression animation synthesis are encouraging.