A study on the consistency of human perception and machine recognition of an emotional corpus

Emotion plays a critical role in human interaction. Listeners can perceive the emotional state of speakers from their facial expression, gestures and/or speech. In this paper, we investigate the relationship of the intended emotion expressed by speakers and the emotion perceived by listeners of a newly recorded corpus. We investigate the consistency of the emotion expressed by speakers and the emotion perceived by listeners. Furthermore, we also compare the results with the outcomes of a speech emotion recognition system. The results indicate that females have a better ability in expressing emotion than males. However, females have greater variance in perceiving emotion. Another finding is that the recognition rates will increase as the duration of an utterance getting longer in length.

[1]  A. Ortony,et al.  What's basic about basic emotions? , 1990, Psychological review.

[2]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[3]  Lianhong Cai,et al.  Speech emotion classification with the combination of statistic features and temporal features , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[4]  L.C. De Silva,et al.  Detection of stress and emotion in speech using traditional and FFT based log energy features , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[5]  Tsang-Long Pao,et al.  Comparison of Classification Methods for Detecting Emotion from Mandarin Speech , 2008, IEICE Trans. Inf. Syst..

[6]  Tsang-Long Pao,et al.  Segment-based emotion recognition from continuous Mandarin Chinese speech , 2011, Comput. Hum. Behav..

[7]  Valery A. Petrushin,et al.  EMOTION IN SPEECH: RECOGNITION AND APPLICATION TO CALL CENTERS , 1999 .

[8]  Ruili Wang,et al.  Real-time spoken affect classification and its application in call-centres , 2005, Third International Conference on Information Technology and Applications (ICITA'05).

[9]  Yi-Ping Phoebe Chen,et al.  Acoustic Features Extraction for Emotion Recognition , 2007, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007).

[10]  Ling Guan,et al.  A neural network approach for human emotion recognition in speech , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[11]  Murray Alpert,et al.  Emotion in Speech: The Acoustic Attributes of Fear, Anger, Sadness, and Joy , 1999, Journal of psycholinguistic research.

[12]  Jakob Fredslund,et al.  I Show You How I Like You: Human-Robot Interaction through Emotional Expression and Tactile Stimulation , 2000 .

[13]  Roddy Cowie,et al.  Automatic statistical analysis of the signal and prosodic signs of emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.