A study of social-affective communication: Automatic prediction of emotion triggers and responses in television talk shows

Advancements in spoken language technologies have allowed users to interact with computers in an increasingly natural manner. However, most conversational agents or dialogue systems are yet to consider emotional awareness in interaction. To consider emotion in these situations, social-affective knowledge in conversational agents is essential. In this paper, we present a study of the social-affective process in natural conversation from television talk shows. We analyze occurrences of emotion (emotional responses), and the events that elicit them (emotional triggers). We then utilize our analysis for prediction to model the ability of a dialogue system to decide an action and response in an affective interaction. This knowledge has great potential to incorporate emotion into human-computer interaction. Experiments in two languages, English and Indonesian, show that automatic prediction performance surpasses random guessing accuracy.

[1]  Tomoki Toda,et al.  Utilizing Human-to-Human Conversation Examples for a Multi Domain Chat-Oriented Dialog System , 2014, IEICE Trans. Inf. Syst..

[2]  Koteswara Rao Anne,et al.  Emotion Recognition Using Spectral Features , 2015 .

[3]  Leontios J. Hadjileontiadis,et al.  EEG‐Based Emotion Recognition Using Advanced Signal Processing Techniques , 2015 .

[4]  Hichem Sahli,et al.  Real-Time Emotion Recognition from Natural Bodily Expressions in Child-Robot Interaction , 2014, ECCV Workshops.

[5]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[6]  Satoshi Nakamura,et al.  Emotion recognition on Indonesian television talk shows , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[7]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[8]  Satoshi Nakamura,et al.  Construction and analysis of Indonesian Emotional Speech Corpus , 2014, 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA).

[9]  Stefan Kopp,et al.  Simulating the Emotion Dynamics of a Multimodal Conversational Agent , 2004, ADS.

[10]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[11]  J. Russell A circumplex model of affect. , 1980 .

[12]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[13]  Nobuhiro Kaji,et al.  Predicting and Eliciting Addressee's Emotion in Online Dialogue , 2013, ACL.

[14]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[15]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[16]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[17]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .

[18]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[19]  Razvan Pascanu,et al.  Theano: Deep Learning on GPUs with Python , 2012 .

[20]  Tomoki Toda,et al.  Emotion and Its Triggers in Human Spoken Dialogue: Recognition and Analysis , 2016 .