Comparison of Different Classifiers for Emotion Recognition

In the present paper a comparison of two classifiers for speech signal emotion recognition is presented. Recognition was performed on emotional Berlin Database. Within this work we concentrate on the evaluation of a speaker-dependent and speaker independent emotion recognition classification. One hundred thirty three (133) speech features obtained from speech signal processing. A basic set of 35 features was selected by statistical method and artificial neural network and Random Forest classifiers were used. Seven classes were categorized, namely anger, happiness, anxiety/fear, sadness, boredom, disgust and neutral. In speaker dependent framework, artificial neural network classification reached an accuracy of 83,17%, and Random Forest 77,19%. In speaker independent framework, for artificial neural network classification a mean accuracy of 55% was reached, while Random Forest reached a mean accuracy of 48%

[1]  Yasunari Yoshitomi,et al.  Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face , 2000, Proceedings 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000 (Cat. No.00TH8499).

[2]  Benoit Huet,et al.  Bimodal Emotion Recognition , 2010, ICSR.

[3]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[4]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[5]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[6]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[7]  P. Boersma Praat : doing phonetics by computer (version 5.1.05) , 2009 .

[8]  Zdravko Kacic,et al.  Context-Independent Multilingual Emotion Recognition from Speech Signals , 2003, Int. J. Speech Technol..

[9]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[10]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[11]  Panayiotis G. Georgiou,et al.  Real-time Emotion Detection System using Speech: Multi-modal Fusion of Different Timescale Features , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[12]  A. Stuart,et al.  Non-Parametric Statistics for the Behavioral Sciences. , 1957 .

[13]  Klaus R. Scherer,et al.  The role of intonation in emotional expressions , 2005, Speech Commun..

[14]  Theodoros Kostoulas,et al.  Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data , 2007, COST 2102 Workshop.

[15]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[16]  Thomas S. Huang,et al.  Emotional expressions in audiovisual human computer interaction , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[17]  W. Abdulla,et al.  Improving speech recognition performance through gender separation , 1988 .

[18]  P. Lang The emotion probe. Studies of motivation and attention. , 1995, The American psychologist.

[19]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[20]  L. C. De Silva,et al.  Bimodal emotion recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[21]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[22]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[23]  Ling Guan,et al.  Recognizing human emotion from audiovisual information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[24]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[25]  Tsutomu Miyasato,et al.  Multimodal human emotion/expression recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[26]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[27]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.