Detection of Negative Emotional States in Real-World Scenario

In the present work we evaluate a detector of negative emotional states (DNES) that serves the purpose of enhancing a spoken dialogue system, which operates in smart-home environment. The DNES component is based on Gaussian mixture models (GMMs) and a set of commonly used speech features. In comprehensive performance evaluation we utilized a well-known acted speech database and real-world speech recordings. The real-world speech was collected during interaction of naive users with our smart-home spoken dialogue system. The experimental results show that the accuracy of recognizing negative emotions on the real- world data is lower than the one reported when testing on the acted speech database, though much promising, considering that, often, humans are unable to distinguish the emotion of other humans judging only from speech.

[1]  Diane J. Litman,et al.  Using word-level pitch features to better predict student emotions during spoken tutoring dialogues , 2005, INTERSPEECH.

[2]  S. Steidl,et al.  The Prosody of Pet Robot Directed Speech: Evidence from Children. , 2006 .

[3]  Steven J. Simske,et al.  Recognition of emotions in interactive voice response systems , 2003, INTERSPEECH.

[4]  Dilek Z. Hakkani-Tür,et al.  Using context to improve emotion detection in spoken dialog systems , 2005, INTERSPEECH.

[5]  Hideki Kawahara,et al.  Analyzing dialogue data for real-world emotional speech classification , 2006, INTERSPEECH.

[6]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[7]  Joel R. Tetreault,et al.  Using system and user performance features to improve emotion detection in spoken tutoring dialogs , 2006, INTERSPEECH.

[8]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[9]  Ling Guan,et al.  Recognizing human emotion from audiovisual information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Kallirroi Georgila,et al.  An ISU Dialogue System Exhibiting Reinforcement Learning of Dialogue Policies: Generic Slot-Filling in the TALK In-car System , 2006, EACL.

[11]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[12]  Winslow Burleson,et al.  Detecting anger in automated voice portal dialogs , 2006, INTERSPEECH.

[13]  Eric Fosler-Lussier,et al.  Information Seeking Spoken Dialogue Systems— Part II: Multimodal Dialogue , 2007, IEEE Transactions on Multimedia.

[14]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[15]  Nikos Fakotakis,et al.  Speech operated smart-home control system for users with special needs , 2005, INTERSPEECH.

[16]  Narada D. Warakagoda,et al.  A speech centric mobile multimodal service useful for dyslectics and aphasics , 2005, INTERSPEECH.

[17]  Ananth N. Iyer,et al.  Emotion Detection From Infant Facial Expressions And Cries , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[18]  Emiel Krahmer,et al.  Real vs. acted emotional speech , 2006, INTERSPEECH.

[19]  Laurence Devillers,et al.  Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs , 2006, INTERSPEECH.

[20]  Björn W. Schuller,et al.  Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles , 2005, INTERSPEECH.