Recognition of emotions in speech by a hierarchical approach

This paper deals with speech emotion analysis within the context of increasing awareness of the wide application potential of affective computing. Unlike most works in the literature which mainly rely on classical frequency and energy based features along with a single global classifier for emotion recognition, we propose in this paper some new harmonic and Zipf based features for better speech emotion characterization in the valence dimension and a multistage classification scheme driven by a dimensional emotion model for better emotional class discrimination. Experimented on the Berlin dataset with 68 features and six emotion states, our approach shows its effectiveness, displaying a 68.60% classification rate and reaching a 71.52% classification rate when a gender classification is first applied. Using the DES dataset with five emotion states, our approach achieves an 81% recognition rate when the best performance in the literature to our knowledge is 76.15% on the same dataset.

[1]  Machiko Kusahara The Art of Creating Subjective Reality: An Analysis of Japanese Digital Pets , 2001, Leonardo.

[2]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Nicole Vincent,et al.  ZIPF ANALYSIS OF AUDIO SIGNALS , 2004 .

[4]  Malcolm Slaney,et al.  Baby Ears: a recognition system for affective vocalizations , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Roddy Cowie,et al.  Automatic recognition of emotion from voice: a rough benchmark , 2000 .

[6]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[7]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[8]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[9]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[10]  Klaus R. Scherer,et al.  Can automatic speaker verification be improved by training the algorithms on emotional speech? , 2000, INTERSPEECH.

[11]  Constantine Kotropoulos,et al.  Emotional Speech Classification Using Gaussian Mixture Models and the Sequential Floating Forward Selection Algorithm , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[12]  Alex Waibel,et al.  EMOTION-SENSITIVE HUMAN-COMPUTER INTERFACES , 2000 .

[13]  Björn W. Schuller,et al.  Evolutionary Feature Generation in Speech Emotion Recognition , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[14]  Rosalind W. Picard Affective Computing , 1997 .

[15]  Allison Druin,et al.  Robots for Kids: Exploring New Technologies for Learning , 2000 .

[16]  Zhongzhe Xiao,et al.  Automatic Hierarchical Classification of Emotional Speech , 2007, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007).

[17]  Constantine Kotropoulos,et al.  Automatic speech classification to five emotional states based on gender information , 2004, 2004 12th European Signal Processing Conference.

[18]  K. Scherer,et al.  Appraisal processes in emotion: Theory, methods, research. , 2001 .

[19]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.