Towards Emotion Recognition from Speech: Definition, Problems and the Materials of Research

One hundred thirty three (133) sound/speech features extracted from Pitch, Mel Frequency Cepstral Coefficients, Energy and Formants were evaluated in order to create a feature set sufficient to discriminate between seven emotions in acted speech. After the appropriate feature selection, Multilayered Perceptrons were trained for emotion recognition on the basis of a 23-input vector, which provide information about the prosody of the speaker over the entire sentence. Several experiments were performed and the results are presented analytically. Extra emphasis was given to assess the proposed 23-input vector in a speaker independent framework where speakers are not “known” to the classifier. The proposed feature vector achieved promising results (51%) for speaker independent recognition in seven emotion classes. Moreover, considering the problem of classifying high and low arousal emotions, our classifier reaches 86.8% successful recognition. The second classification model incorporated Support Vector Machine with 35 predictive variables. The latter feature vector achieved promising results (78%) for speaker independent recognition in seven emotion classes. Moreover, considering the problem of classifying high and low arousal emotions, our classifier reaches 100 % successful recognition for high arousal and 87% for low arousal emotions. Beside the combination of speech processing and artificial intelligence techniques, new approaches incorporating linguistic semantics could play a critical role to help computers understand human emotions better.

[1]  P. Ekman Universals and cultural differences in facial expressions of emotion. , 1972 .

[2]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[3]  Stephen R. Garner,et al.  WEKA: The Waikato Environment for Knowledge Analysis , 1996 .

[4]  P. Ekman,et al.  Approaches To Emotion , 1985 .

[5]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[6]  Dimitrios Ververidis,et al.  A State of the Art Review on Emotional Speech Databases , 2003 .

[7]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[8]  C. Izard Basic emotions, relations among emotions, and emotion-cognition relations. , 1992, Psychological review.

[9]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[10]  A. Wierzbicka Emotions Across Languages And Cultures , 1999 .

[11]  L. C. De Silva,et al.  Bimodal emotion recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[12]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[13]  Theodoros Kostoulas,et al.  Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data , 2007, COST 2102 Workshop.

[14]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[15]  Nikolaos G. Bourbakis,et al.  Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction, COST Action 2102 International Conference, Patras, Greece, October 29-31, 2007. Revised Papers , 2008, COST 2102 Workshop.

[16]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[17]  Tsutomu Miyasato,et al.  Multimodal human emotion/expression recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[18]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[19]  Ling Guan,et al.  Recognizing human emotion from audiovisual information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[20]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[21]  S. Tomkins,et al.  Affect Imagery Consciousness: The Positive Affects , 1963 .

[22]  Roddy Cowie,et al.  Beyond emotion archetypes: Databases for emotion modelling using neural networks , 2005, Neural Networks.

[23]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[24]  C. Izard The face of emotion , 1971 .

[25]  Klaus R. Scherer,et al.  The role of intonation in emotional expressions , 2005, Speech Commun..

[26]  Zdravko Kacic,et al.  Context-Independent Multilingual Emotion Recognition from Speech Signals , 2003, Int. J. Speech Technol..

[27]  K. Scherer,et al.  The World of Emotions is not Two-Dimensional , 2007, Psychological science.

[28]  Yasunari Yoshitomi,et al.  Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face , 2000, Proceedings 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000 (Cat. No.00TH8499).

[29]  P. Ekman An argument for basic emotions , 1992 .

[30]  Panayiotis G. Georgiou,et al.  Real-time Emotion Detection System using Speech: Multi-modal Fusion of Different Timescale Features , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[31]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[32]  Thomas S. Huang,et al.  Emotional expressions in audiovisual human computer interaction , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).