Statistical Evaluation of Speech Features for Emotion Recognition

This paper presents an emotion recognition framework based on sound processing could significantly improve human computer interaction. One hundred thirty three (133) speech features obtained from sound processing of acting speech were tested in order to create a feature set sufficient to discriminate between seven emotions. Following statistical analysis in order to assess the significance of each speech feature, artificial neural networks were trained to classify emotions on the basis of a 35-input vector, which provide information about the prosody of the speaker over the entire sentence. Extra emphasis was given to assess the proposed 35-input vector in a speaker independent framework since test instances belong to different speakers from the training set. Several experiments were performed and the results are presented analytically. Considering the inherently difficulty of the problem, the proposed feature vector achieved promising results (51%) for speaker independent recognition in the seven emotion classes of Berlin Database.

[1]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[2]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[3]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[4]  Panayiotis G. Georgiou,et al.  Real-time Emotion Detection System using Speech: Multi-modal Fusion of Different Timescale Features , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[5]  Zdravko Kacic,et al.  Context-Independent Multilingual Emotion Recognition from Speech Signals , 2003, Int. J. Speech Technol..

[6]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[7]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[8]  L. C. De Silva,et al.  Bimodal emotion recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[9]  P. Boersma Praat : doing phonetics by computer (version 5.1.05) , 2009 .

[10]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[11]  Tsutomu Miyasato,et al.  Multimodal human emotion/expression recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[12]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[13]  Ling Guan,et al.  Recognizing human emotion from audiovisual information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[15]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[16]  A. Stuart,et al.  Non-Parametric Statistics for the Behavioral Sciences. , 1957 .

[17]  Theodoros Kostoulas,et al.  Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data , 2007, COST 2102 Workshop.

[18]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[19]  Yasunari Yoshitomi,et al.  Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face , 2000, Proceedings 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000 (Cat. No.00TH8499).

[20]  Benoit Huet,et al.  Bimodal Emotion Recognition , 2010, ICSR.

[21]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[22]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[23]  Thomas S. Huang,et al.  Emotional expressions in audiovisual human computer interaction , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[24]  W. Abdulla,et al.  Improving speech recognition performance through gender separation , 1988 .

[25]  P. Lang The emotion probe. Studies of motivation and attention. , 1995, The American psychologist.

[26]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[27]  Klaus R. Scherer,et al.  The role of intonation in emotional expressions , 2005, Speech Commun..