Automatic Emotion Recognition by the Speech Signal

This paper dis cusses approaches to recognize the emotional user state by analyzing spoken utterances on both, the semantic and the signal level. We classify seven emotions: joy, anger, irritation, fear, disgust, sadness and neutral inner state. The introduced methods analyze the wording, the degree of verbosity, the temporal intention rate as well as the history of user utterances. As prosodic features duration, pitch and energy contribute to a robust recognition. Further more the problem of spotting for emotional phrases in the human-computer-interaction is alluded. User profiling supports the adaptation of different cultural comprehensions of verbally expressed emotions. To legitimate the applied features results of usability studies are introduced. Finally fields of application are shown and results are discussed.

[1]  Hideki Kawahara,et al.  Comparative evaluation of F0 estimation algorithms , 2001, INTERSPEECH.

[2]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[3]  Noam Amir,et al.  Classifying emotions in speech: a comparison of methods , 2001, INTERSPEECH.

[4]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[5]  Shrikanth S. Narayanan,et al.  Politeness and frustration language in child-machine interactions , 2001, INTERSPEECH.

[6]  Björn Schuller,et al.  Navigation in virtual worlds via natural speech , 2001 .

[7]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[8]  T. S. Polzin,et al.  Verbal and non-verbal cues in the communication of emotions , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  Cecile Pereira DIMENSIONS OF EMOTIONAL MEANING IN SPEECH , 2000 .

[10]  N. Amir,et al.  Towards an automatic classification of emotions in speech , 1998, ICSLP.

[11]  Andreas Kießling,et al.  Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung / Andreas Kiessling , 1997 .

[12]  Barbara Heuft,et al.  Emotions in time domain synthesis , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  Roddy Cowie,et al.  Automatic statistical analysis of the signal and prosodic signs of emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  Fumitada Itakura,et al.  Distance measure for speech recognition based on the smoothed group delay spectrum , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.