Problems of the Automatic Emotion Recognitions in Spontaneous Speech; An Example for the Recognition in a Dispatcher Center

Numerous difficulties, in the examination of emotions occurring in continuous spontaneous speech, are discussed in this paper, than different emotion recognition experiments are presented, using clauses as the recognition unit. In a testing experiment it was examined that what kind of acoustical features are the most important for the characterization of emotions, using spontaneous speech database. An SVM classifier was built for the classification of 4 most frequent emotions. It was found that fundamental frequency, energy, and its dynamics in a clause are the main characteristic parameters for the emotions, and the average spectral information, as MFCC and harmonicity are also very important. In a real life experiment automatic recognition system was prepared for a telecommunication call center. Summing up the results of these experiments, we can say, that clauses can be an optimal unit of the recognition of emotions in continuous speech.

[1]  Anna Esposito,et al.  The Perceptual and Cognitive Role of Visual and Auditory Channels in Conveying Emotional Information , 2009, Cognitive Computation.

[2]  Nick Campbell,et al.  Individual Traits of Speaking Style and Speech Rhythm in a Spoken Discourse , 2008, COST 2102 Workshop.

[3]  Zdravko Kacic,et al.  A rule-based emotion-dependent feature extraction method for emotion analysis from speech. , 2006, The Journal of the Acoustical Society of America.

[4]  Inma Hernáez,et al.  An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Nikolaos G. Bourbakis,et al.  Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction, COST Action 2102 International Conference, Patras, Greece, October 29-31, 2007. Revised Papers , 2008, COST 2102 Workshop.

[6]  Martin T Rothman,et al.  Getting to the heart of the matter , 1999, Annals of the rheumatic diseases.

[7]  Theodoros Kostoulas,et al.  Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data , 2007, COST 2102 Workshop.

[8]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[9]  György Szaszák,et al.  Using prosody for the improvement of ASR - sentence modality recognition , 2008, INTERSPEECH.

[10]  Alan Bundy,et al.  Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence - IJCAI-95 , 1995 .

[11]  Klára Vicsi,et al.  Speech Emotion Perception by Human and Machine , 2008, COST 2102 Workshop.

[12]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[13]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[14]  Nick Campbell Getting to the Heart of the Matter; Speech is More than Just the Expression of Text or Language , 2004, LREC.