Multi-stage classification of emotional speech motivated by a dimensional emotion model

This paper deals with speech emotion analysis within the context of increasing awareness of the wide application potential of affective computing. Unlike most works in the literature which mainly rely on classical frequency and energy based features along with a single global classifier for emotion recognition, we propose in this paper some new harmonic and Zipf based features for better speech emotion characterization in the valence dimension and a multi-stage classification scheme driven by a dimensional emotion model for better emotional class discrimination. Experimented on the Berlin dataset with 68 features and six emotion states, our approach shows its effectiveness, displaying a 68.60% classification rate and reaching a 71.52% classification rate when a gender classification is first applied. Using the DES dataset with five emotion states, our approach achieves an 81% recognition rate when the best performance in the literature to our knowledge is 76.15% on the same dataset.

[1]  Rosalind W. Picard Affective computing: (526112012-054) , 1997 .

[2]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[3]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[4]  S. Havlin The distance between Zipf plots , 1995 .

[5]  Allison Druin,et al.  Robots for Kids: Exploring New Technologies for Learning , 2000 .

[6]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[7]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[8]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[9]  W. Sendlmeier,et al.  Verification of acoustical correlates of emotional speech using formant-synthesis , 2000 .

[10]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[11]  L. Lamel,et al.  Emotion detection in task-oriented spoken dialogues , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[12]  Liming Chen,et al.  Voice-Based Gender Identification in Multimedia Applications , 2005, Journal of Intelligent Information Systems.

[13]  Philipp Goedeking,et al.  Primate Vocal Communication , 1988, Springer Berlin Heidelberg.

[14]  Arvid Kappas,et al.  Primate Vocal Expression of Affective State , 1988 .

[15]  Björn W. Schuller,et al.  Brute-forcing hierarchical functionals for paralinguistics: A waste of feature space? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[17]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[18]  P. Juslin,et al.  Cue Utilization in Communication of Emotion in Music Performance: Relating Performance to Perception Studies of Music Performance , 2022 .

[19]  Alex Waibel,et al.  EMOTION-SENSITIVE HUMAN-COMPUTER INTERFACES , 2000 .

[20]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[21]  Åsa Abelin,et al.  Cross linguistic interpretation of emotional prosody , 2002 .

[22]  Cynthia Breazeal,et al.  Designing sociable robots , 2002 .

[23]  Machiko Kusahara The Art of Creating Subjective Reality: An Analysis of Japanese Digital Pets , 2001, Leonardo.

[24]  Paul Sajda,et al.  Role of feature selection in building pattern recognizers for computer-aided diagnosis , 1998, Medical Imaging.

[25]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[26]  Zhongzhe Xiao,et al.  Two-stage Classification of Emotional Speech , 2006, International Conference on Digital Telecommunications (ICDT'06).

[27]  Liyanage C. De Silva,et al.  Voting ensembles for spoken affect classification , 2007, J. Netw. Comput. Appl..

[28]  Rosalind W. Picard Affective Computing , 1997 .

[29]  D. Watson,et al.  Toward a consensual structure of mood. , 1985, Psychological bulletin.

[30]  Björn W. Schuller,et al.  Speaker Independent Speech Emotion Recognition by Ensemble Classification , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[31]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[32]  Constantine Kotropoulos,et al.  Emotional Speech Classification Using Gaussian Mixture Models and the Sequential Floating Forward Selection Algorithm , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[33]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34]  Zhongzhe Xiao,et al.  Hierarchical Classification of Emotional Speech , 2007 .

[35]  Nicole Vincent,et al.  ZIPF ANALYSIS OF AUDIO SIGNALS , 2004 .

[36]  K. Scherer,et al.  Appraisal processes in emotion: Theory, methods, research. , 2001 .

[37]  Malcolm Slaney,et al.  Baby Ears: a recognition system for affective vocalizations , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[38]  Zhongzhe Xiao,et al.  Features extraction and selection for emotional speech classification , 2005, IEEE Conference on Advanced Video and Signal Based Surveillance, 2005..

[39]  Ricco Rakotomalala,et al.  TANAGRA : un logiciel gratuit pour l'enseignement et la recherche , 2005, EGC.

[40]  Björn W. Schuller,et al.  Evolutionary Feature Generation in Speech Emotion Recognition , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[41]  M. Yik A circumplex model of affect and its relation to personality : a five-language study , 1999 .

[42]  A. Tickle,et al.  ENGLISH AND JAPANESE SPEAKERS ’ EMOTION VOCALISATION AND RECOGNITION : A COMPARISON HIGHLIGHTING VOWEL QUALITY , 2000 .

[43]  Klaus R. Scherer,et al.  Can automatic speaker verification be improved by training the algorithms on emotional speech? , 2000, INTERSPEECH.

[44]  Piotr Synak,et al.  Extracting Emotions from Music Data , 2005, ISMIS.

[45]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[46]  Roddy Cowie,et al.  Automatic recognition of emotion from voice: a rough benchmark , 2000 .

[47]  Zhongzhe Xiao,et al.  Automatic Hierarchical Classification of Emotional Speech , 2007, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007).

[48]  Cecile Pereira DIMENSIONS OF EMOTIONAL MEANING IN SPEECH , 2000 .

[49]  Rosario N. Mantegna,et al.  Numerical Analysis of Word Frequencies in Artificial and Natural Language Texts , 1997 .

[50]  Donald G. Childers,et al.  Silent and voiced/unvoiced/mixed excitation (four-way) classification of speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[51]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[52]  R. Thayer The biopsychology of mood and arousal , 1989 .

[53]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[54]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[55]  Constantine Kotropoulos,et al.  Automatic speech classification to five emotional states based on gender information , 2004, 2004 12th European Signal Processing Conference.

[56]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.