Automatic Labeling Affective Scenes in Spoken Conversations

Research in affective computing has mainly focused on analyzing human emotional states as perceivable within limited contexts such as speech utterances. In our study, we focus on the dynamic transitions of the emotional states that are appearing throughout the conversations and investigate computational models to automatically label emotional states using the proposed affective scene framework. An affective scene includes a complete sequence of emotional states in a conversation from its start to its end. Affective scene instances include different patterns of behavior such as who manifests an emotional state, when it is manifested, and which kinds of changes occur due to the influence of one’s emotion onto another interlocutor. In this paper, we present the design and training of an automatic affective scene segmentation and classification system for spoken conversations. We comparatively evaluate the contributions of different feature types in the acoustic, lexical and psycholinguistic space and their correlations and combination.

[1]  Morena Danieli,et al.  Annotating and categorizing competition in overlap speech , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Sigal G. Barsade,et al.  Understanding emotional transitions: the interpersonal consequences of changing emotions in negotiations. , 2011, Journal of personality and social psychology.

[3]  Panayiotis G. Georgiou,et al.  Real-time Emotion Detection System using Speech: Multi-modal Fusion of Different Timescale Features , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[4]  Firoj Alam,et al.  Comparative study of speaker personality traits recognition in conversational and broadcast news speech , 2013, INTERSPEECH.

[5]  Björn Schuller,et al.  Computational Paralinguistics , 2013 .

[6]  Firoj Alam Computational Models for Analyzing Affective Behaviors and Personality from Speech and Text , 2016 .

[7]  Firoj Alam,et al.  Fusion of acoustic, linguistic and psycholinguistic features for Speaker Personality Traits recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Björn W. Schuller,et al.  Paralinguistics in speech and language - State-of-the-art and the challenge , 2013, Comput. Speech Lang..

[9]  Eduardo Lleida,et al.  Audio segmentation-by-classification approach based on factor analysis in broadcast news domain , 2014, EURASIP J. Audio Speech Music. Process..

[10]  Firoj Alam,et al.  How Interlocutors Coordinate with each other within Emotional Segments? , 2016, COLING.

[11]  Firoj Alam,et al.  Predicting Personality Traits using Multimodal Information , 2014, WCPR '14.

[12]  Firoj Alam,et al.  Unsupervised recognition and clustering of speech overlaps in spoken conversations , 2014, SLAM@INTERSPEECH.

[13]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[14]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[15]  M. Hoffman Empathy and prosocial behavior. , 2008 .

[16]  Dilek Z. Hakkani-Tür,et al.  Grounding Emotions in Human-Machine Conversational Systems , 2005, INTETAIN.

[17]  Amit Konar,et al.  Emotion Recognition: A Pattern Analysis Approach , 2015 .

[18]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[19]  Laurence Devillers,et al.  Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs , 2006, INTERSPEECH.

[20]  K. Scherer,et al.  The New Handbook of Methods in Nonverbal Behavior Research , 2008 .

[21]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[22]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[23]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[24]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[25]  S. A. Chowdhury,et al.  Predicting User Satisfaction from Turn-Taking in Spoken Conversations , 2016, INTERSPEECH.

[26]  Simone G. Shamay-Tsoory,et al.  Understanding emotional and cognitive empathy , 2013 .

[27]  Morena Danieli,et al.  The role of speakers and context in classifying competition in overlapping speech , 2015, INTERSPEECH.

[28]  Anna Esposito,et al.  Classification of emotional speech units in call centre interactions , 2013, 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom).

[29]  K. Scherer Psychological models of emotion. , 2000 .

[30]  Firoj Alam,et al.  Emotion Unfolding and Affective Scenes: A Case Study in Spoken Conversations , 2015, ERM4CT@ICMI.

[31]  Sarvenaz Choobdar,et al.  Modeling customer churn in a non-contractual setting: the case of telecommunications service providers , 2010 .

[32]  Carlos Busso,et al.  Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions , 2009, INTERSPEECH.

[33]  K. Scherer Appraisal considered as a process of multilevel sequential checking. , 2001 .

[34]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[35]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[36]  P. Baranyi,et al.  Definition and synergies of cognitive infocommunications , 2012 .

[37]  N. Frijda Moods, emotion episodes, and emotions. , 1993 .

[38]  S. A. Chowdhury,et al.  A Deep Learning approach to modeling competitiveness in spoken conversations , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  L. F. Barrett,et al.  Handbook of Emotions , 1993 .

[40]  J. Gross The Emerging Field of Emotion Regulation: An Integrative Review , 1998 .

[41]  S. A. Chowdhury Computational modeling of turn-taking dynamics in spoken conversations , 2017 .