Speech Emotion Analysis: Exploring the Role of Context

Automated analysis of human affective behavior has attracted increasing attention in recent years. With the research shift toward spontaneous behavior, many challenges have come to surface ranging from database collection strategies to the use of new feature sets (e.g., lexical cues apart from prosodic features). Use of contextual information, however, is rarely addressed in the field of affect expression recognition, yet it is evident that affect recognition by human is largely influenced by the context information. Our contribution in this paper is threefold. First, we introduce a novel set of features based on cepstrum analysis of pitch and intensity contours. We evaluate the usefulness of these features on two different databases: Berlin Database of emotional speech (EMO-DB) and locally collected audiovisual database in car settings (CVRRCar-AVDB). The overall recognition accuracy achieved for seven emotions in the EMO-DB database is over 84% and over 87% for three emotion classes in CVRRCar-AVDB. This is based on tenfold stratified cross validation. Second, we introduce the collection of a new audiovisual database in an automobile setting (CVRRCar-AVDB). In this current study, we only use the audio channel of the database. Third, we systematically analyze the effects of different contexts on two different databases. We present context analysis of subject and text based on speaker/text-dependent/-independent analysis on EMO-DB. Furthermore, we perform context analysis based on gender information on EMO-DB and CVRRCar-AVDB. The results based on these analyses are promising.

[1]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[2]  Jiaya Jia,et al.  To appear in , 2004 .

[3]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[4]  Mohan M. Trivedi,et al.  2010 International Conference on Pattern Recognition Speech Emotion Analysis in Noisy Real-World Environment , 2022 .

[5]  Diane J. Litman,et al.  Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources , 2004, NAACL.

[6]  Ashish Kapoor,et al.  Automatic prediction of frustration , 2007, Int. J. Hum. Comput. Stud..

[7]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Christian D. Schunn,et al.  Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction , 2002, Proc. IEEE.

[9]  Glenn I. Roisman,et al.  The emotional integration of childhood experience: physiological, facial expressive, and self-reported emotional response during the adult attachment interview. , 2004, Developmental psychology.

[10]  Mohan M. Trivedi,et al.  Holistic Sensing and Active Displays for Intelligent Driver Support Systems , 2007, Computer.

[11]  Stephen E. Levinson,et al.  Children's emotion recognition in an intelligent tutoring scenario , 2004, INTERSPEECH.

[12]  John H. L. Hansen,et al.  Speech under stress conditions: overview of the effect on speech production and on system performance , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[13]  Mohan M. Trivedi,et al.  Audiovisual Information Fusion in Human–Computer Interfaces and Intelligent Environments: A Survey , 2010, Proceedings of the IEEE.

[14]  Maja Pantic,et al.  Gaze-X: adaptive affective multimodal interface for single-user office scenarios , 2006, ICMI '06.

[15]  Christine L. Lisetti,et al.  MAUI: a multimodal affective user interface , 2002, MULTIMEDIA '02.

[16]  M. Bartlett,et al.  Machine Analysis of Facial Expressions , 2007 .

[17]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[18]  P. Ekman,et al.  Facial Expression in Affective Disorders , 2005 .

[19]  Ramón López-Cózar,et al.  Influence of contextual information in emotion annotation for spoken dialogue systems , 2008, Speech Commun..

[20]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Alessandra Russo,et al.  Speech Emotion Classification Using Machine Learning Algorithms , 2008, 2008 IEEE International Conference on Semantic Computing.

[22]  M. Ozkan,et al.  Speech-Driven Automatic Facial Expression Synthesis , 2008, 2008 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video.

[23]  J. Cohn,et al.  Mother–infant face-to-face interaction: The sequence of dyadic states at 3, 6, and 9 months. , 1987 .

[24]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[25]  Elisabeth André,et al.  Exploring the benefits of discretization of acoustic features for speech emotion recognition , 2009, INTERSPEECH.

[26]  Björn W. Schuller,et al.  On the Necessity and Feasibility of Detecting a Driver's Emotional State While Driving , 2007, ACII.

[27]  Mohan M. Trivedi,et al.  Speech based emotion classification framework for driver assistance system , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[28]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[29]  Lori Lamel,et al.  Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.

[30]  Laurence Devillers,et al.  Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs , 2006, INTERSPEECH.

[31]  Mohan M. Trivedi,et al.  Role of head pose estimation in speech acquisition from distant microphones , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Lei Zhang,et al.  A Robust Multi-Modal Emotion Recognition Framework for Intelligent Tutoring Systems , 2011, 2011 IEEE 11th International Conference on Advanced Learning Technologies.

[33]  Tarak Gandhi,et al.  Looking-In and Looking-Out of a Vehicle: Computer-Vision-Based Enhanced Vehicle Safety , 2007, IEEE Transactions on Intelligent Transportation Systems.

[34]  Mohan M. Trivedi,et al.  Dynamic context capture and distributed video arrays for intelligent spaces , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[35]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[36]  Diane J. Litman,et al.  Predicting Student Emotions in Computer-Human Tutoring Dialogues , 2004, ACL.

[37]  J. Cohn,et al.  Mother infant face to face interaction influence is bidirectional and unrelated to periodic cycles in either partner's behavior , 1988 .

[38]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[39]  Bernd Kleinjohann,et al.  Prosody based emotion recognition for MEXI , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.