Emotion Unfolding and Affective Scenes: A Case Study in Spoken Conversations

The manifestation of human emotions evolves over time and space. Most of the work on affective computing research is limited to the association of context-free signal segments, such as utterances and images, to basic emotions. In this paper, we discuss the hypothesis that interpreting emotions requires a conceptual description of their dynamics within the context of their manifestations. We describe the unfolding of emotions through the proposed affective scene framework. Affective scenes are defined in terms of who first expresses the variation in their emotional state in a conversation, how this affects the other speaker's emotional appraisal and response, and which modifications occur from the initial through the final state of the scene. This conceptual framework is applied and evaluated on real human-human conversations drawn from call centers. We show that the automatic classification of affective scenes achieves more than satisfactory results and it benefits from acoustic, lexical and psycholinguistic features of the speech and linguistics signals.

[1]  Panayiotis G. Georgiou,et al.  Real-time Emotion Detection System using Speech: Multi-modal Fusion of Different Timescale Features , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[2]  Dilek Z. Hakkani-Tür,et al.  Grounding Emotions in Human-Machine Conversational Systems , 2005, INTETAIN.

[3]  J. Gross The Emerging Field of Emotion Regulation: An Integrative Review , 1998 .

[4]  Загоровская Ольга Владимировна,et al.  Исследование влияния пола и психологических характеристик автора на количественные параметры его текста с использованием программы Linguistic Inquiry and Word Count , 2015 .

[5]  Ross A. Thompson,et al.  Emotion regulation: Conceptual foundations , 2007 .

[6]  Athanasios Katsamanis,et al.  Toward automating a human behavioral coding system for married couples' interactions using speech acoustic features , 2013, Speech Commun..

[7]  Carlos Busso,et al.  Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions , 2009, INTERSPEECH.

[8]  Firoj Alam,et al.  Predicting Personality Traits using Multimodal Information , 2014, WCPR '14.

[9]  Charles Goodwin,et al.  Emotion within Situated Activity , 2000 .

[10]  K. Scherer Appraisal considered as a process of multilevel sequential checking. , 2001 .

[11]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[12]  Laurence Devillers,et al.  Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs , 2006, INTERSPEECH.

[13]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[14]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[15]  Firoj Alam,et al.  Unsupervised recognition and clustering of speech overlaps in spoken conversations , 2014, SLAM@INTERSPEECH.

[16]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[17]  Björn Schuller,et al.  Computational Paralinguistics , 2013 .

[18]  Björn W. Schuller,et al.  Paralinguistics in speech and language - State-of-the-art and the challenge , 2013, Comput. Speech Lang..

[19]  James J. Gross,et al.  Handbook of emotion regulation , 2007 .

[20]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[21]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[22]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[23]  Firoj Alam,et al.  Fusion of acoustic, linguistic and psycholinguistic features for Speaker Personality Traits recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Firoj Alam,et al.  Comparative study of speaker personality traits recognition in conversational and broadcast news speech , 2013, INTERSPEECH.

[25]  Ian Witten,et al.  Data Mining , 2000 .

[26]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[27]  Amit Konar,et al.  Emotion Recognition: A Pattern Analysis Approach , 2015 .

[28]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .