A Dimensional Emotion Model Driven Multi-stage Classification of Emotional Speech

This paper deals with speech emotion analysis within the context of increasing awareness on wide application potential of affective computing. Unlike the most of works in the literature which mainly rely on classical frequency and energy based features along with a single global classifier for emotion recognition, we propose in this paper some new harmonic and Zipf based features for better speech emotion characterization in terms of timbre, rhythm and prosody and a dimensional emotion model driven multi-stage classification scheme for a better emotional class discrimination. Experimented on Berlin dataset [1] with 68 features and six emotion states, our approach shows its effectiveness, displaying a 68.60% classification rate and reaching a 71.52% classification rate when a gender classification is first applied. Using DES dataset having five emotion states, our approach achieves an 81% recognition rate when the best performance in the literature to our knowledge is 66% on the same dataset [2].

[1]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[2]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[3]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[4]  Arvid Kappas,et al.  Primate Vocal Expression of Affective State , 1988 .

[5]  Donald G. Childers,et al.  Silent and voiced/unvoiced/mixed excitation (four-way) classification of speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  S. Havlin The distance between Zipf plots , 1995 .

[7]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[8]  Rosario N. Mantegna,et al.  Numerical Analysis of Word Frequencies in Artificial and Natural Language Texts , 1997 .

[9]  Malcolm Slaney,et al.  Baby Ears: a recognition system for affective vocalizations , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  Paul Sajda,et al.  Role of feature selection in building pattern recognizers for computer-aided diagnosis , 1998, Medical Imaging.

[11]  Klaus R. Scherer,et al.  Can automatic speaker verification be improved by training the algorithms on emotional speech? , 2000, INTERSPEECH.

[12]  Cecile Pereira DIMENSIONS OF EMOTIONAL MEANING IN SPEECH , 2000 .

[13]  Roddy Cowie,et al.  Automatic recognition of emotion from voice: a rough benchmark , 2000 .

[14]  W. Sendlmeier,et al.  Verification of acoustical correlates of emotional speech using formant-synthesis , 2000 .

[15]  Alex Waibel,et al.  EMOTION-SENSITIVE HUMAN-COMPUTER INTERFACES , 2000 .

[16]  Allison Druin,et al.  Robots for Kids: Exploring New Technologies for Learning , 2000 .

[17]  A. Tickle,et al.  ENGLISH AND JAPANESE SPEAKERS ’ EMOTION VOCALISATION AND RECOGNITION : A COMPARISON HIGHLIGHTING VOWEL QUALITY , 2000 .

[18]  Machiko Kusahara The Art of Creating Subjective Reality: An Analysis of Japanese Digital Pets , 2001, Leonardo.

[19]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[20]  K. Scherer,et al.  Appraisal processes in emotion: Theory, methods, research. , 2001 .

[21]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[22]  Åsa Abelin,et al.  Cross linguistic interpretation of emotional prosody , 2002 .

[23]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[24]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[25]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[26]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[27]  Nicole Vincent,et al.  ZIPF ANALYSIS OF AUDIO SIGNALS , 2004 .

[28]  Constantine Kotropoulos,et al.  Automatic speech classification to five emotional states based on gender information , 2004, 2004 12th European Signal Processing Conference.

[29]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Liming Chen,et al.  Voice-Based Gender Identification in Multimedia Applications , 2005, Journal of Intelligent Information Systems.

[31]  Constantine Kotropoulos,et al.  Emotional Speech Classification Using Gaussian Mixture Models and the Sequential Floating Forward Selection Algorithm , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[32]  Ricco Rakotomalala,et al.  TANAGRA : un logiciel gratuit pour l'enseignement et la recherche , 2005, EGC.

[33]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[34]  Piotr Synak,et al.  Extracting Emotions from Music Data , 2005, ISMIS.

[35]  Zhongzhe Xiao,et al.  Two-stage Classification of Emotional Speech , 2006, International Conference on Digital Telecommunications (ICDT'06).

[36]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[37]  Zhongzhe Xiao,et al.  Hierarchical Classification of Emotional Speech , 2007 .

[38]  Zhongzhe Xiao,et al.  Automatic Hierarchical Classification of Emotional Speech , 2007, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007).