Improved emotion recognition with large set of statistical features

This paper presents and discusses the speaker dependent emotion recognition with large set of statistical features. The speaker dependent emotion recognition gains in present the best accuracy performance. Recognition was performed on English, Slovenian, Spanish, and French InterFace emotional speech databases. All databases include 9 speakers. The InterFace databases include neutral speaking style and six emotions: disgust, surprise, joy, fear, anger and sadness. Speech features for emotion recognition were determined in two steps. In the first step, acoustical features were defined and in the second statistical features were calculated from acoustical features. Acoustical features are composed from pitch, derivative of pitch, energy, derivative of energy, duration of speech segments, jitter, and shimmer. Statistical features are statistical presentations of acoustical features. In previous study feature vector was composed from 26 elements. In this study the feature vector was composed from 144 elements. The new feature set was called large set of statistical features. Emotion recognition was performed using artificial neural networks. Significant improvement was achieved for all speakers except for Slovenian male and second English male speaker were the improvement was about 2%. Large set of statistical features improve the accuracy of recognised emotion in average for about 18%.

[1]  A. Tickle,et al.  ENGLISH AND JAPANESE SPEAKERS ’ EMOTION VOCALISATION AND RECOGNITION : A COMPARISON HIGHLIGHTING VOWEL QUALITY , 2000 .

[2]  Albino Nogueiras,et al.  Interface Databases: Design and Collection of a Multilingual Emotional Speech Database , 2002, LREC.

[3]  Slovenian Lang,et al.  An Environment for Word Prominence Classification in Slovenian Language , 2003 .

[4]  A. Camurri,et al.  An Architecture for Emotional Agents , 1998, IEEE Multim..

[5]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[6]  Zdravko Kacic,et al.  Objective analysis of emotional speech for English and Slovenian Interface emotional speech databases , 2002, LREC.

[7]  Say Wei Foo,et al.  Speech emotion recognition using hidden Markov models , 2003, Speech Commun..

[8]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[9]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[10]  Noam Amir,et al.  Classifying emotions in speech: a comparison of methods , 2001, INTERSPEECH.

[11]  Paul Taylor,et al.  The rise/fall/connection model of intonation , 1994, Speech Communication.

[12]  Zdravko Kacic,et al.  Context-Independent Multilingual Emotion Recognition from Speech Signals , 2003, Int. J. Speech Technol..

[13]  Yang Li,et al.  Recognizing emotions in speech using short-term and long-term features , 1998, ICSLP.

[14]  Pierre-Yves Oudeyer,et al.  Novel Useful Features and Algorithms for the Recognition of Emotions in Human Speech , 2002 .

[15]  Tom Johnstone,et al.  Emotional speech elicited using computer games , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16]  Ramdas Kumaresan,et al.  A variable frame pitch estimator and test results , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[17]  E. Kramer Judgment of personal characteristics and emotions from nonverbal properties of speech. , 1963, Psychological bulletin.