Prediction of Emotion Change From Speech

The fact that emotions are dynamic in nature and evolve across time has been explored relatively less often in automatic emotion recognition systems to date. Although within-utterance information about emotion changes recently has received some attention, there remain open questions unresolved, such as how to approach delta emotion ground truth, how to predict the extent of emotion change from speech, and how well change can be predicted relative to absolute emotion ratings. In this article, we investigate speech-based automatic systems for continuous prediction of the extent of emotion changes in arousal/valence. We propose the use of regression (smoothed) deltas as ground truth for emotion change, which yielded considerably higher inter-rater reliability than first-order deltas, a commonly used approach in previous research, and represent a more appropriate approach to derive annotations for emotion change research, findings which are applicable beyond speech-based systems. In addition, the first system design for continuous emotion change prediction from speech is explored. Experimental results under the Output-Associative Relevance Vector Machine framework interestingly show that changes in emotion ratings may be better predicted than absolute emotion ratings on the RECOLA database, achieving 0.74 vs 0.71 for arousal and 0.41 vs 0.37 for valence in concordance correlation coefficients. However, further work is needed to achieve effective emotion change prediction performances on the SEMAINE database, due to the large number of non-change frames in the absolute emotion ratings.

[1]  Hatice Gunes,et al.  Continuous Analysis of Affect from Voice and Face , 2011, Computer Analysis of Human Behavior.

[2]  Ingo Siegert,et al.  Inter-rater reliability for emotion annotation in human–computer interaction: comparison and methodological improvements , 2013, Journal on Multimodal User Interfaces.

[3]  James J. Gross,et al.  Emotion Regulation in Adulthood: Timing Is Everything , 2001 .

[4]  Dongmei Jiang,et al.  Leveraging the Bayesian Filtering Paradigm for Vision-Based Facial Affective State Estimation , 2018, IEEE Transactions on Affective Computing.

[5]  Bryan H. Choi,et al.  Emotional change process in resolving self-criticism during experiential treatment of depression , 2016, Psychotherapy research : journal of the Society for Psychotherapy Research.

[6]  Firoj Alam,et al.  Emotion Unfolding and Affective Scenes: A Case Study in Spoken Conversations , 2015, ERM4CT@ICMI.

[7]  Georgios N. Yannakakis,et al.  Ratings are Overrated! , 2015, Front. ICT.

[8]  Lianhong Cai,et al.  Automatic Emotion Variation Detection in continuous speech , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[9]  Mohamed Chetouani,et al.  Robust continuous prediction of human emotions using multiscale dynamic cues , 2012, ICMI '12.

[10]  Carlos Busso,et al.  Interpreting ambiguous emotional expressions , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[11]  Siyuan Chen,et al.  Automatic classification of eye activity for cognitive load measurement with emotion interference , 2013, Comput. Methods Programs Biomed..

[12]  Timothy D. Ritchie,et al.  The fading affect bias in the context of emotion activation level, mood, and personal theories of emotion change , 2009, Memory.

[13]  Christian E. Waugh,et al.  Temporal Dynamics of Emotional Processing in the Brain , 2015 .

[14]  Lu Xu,et al.  Shift Window Based Framework for Emotional Change Detection of Speech , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[15]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[16]  Emily Mower Provost,et al.  Emotion classification via utterance-level dynamics: A pattern-based approach to characterizing affective expressions , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[18]  Tsang-Long Pao,et al.  Recognition and analysis of emotion transition in mandarin speech signal , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[19]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .

[20]  B. Schuller,et al.  A Ranking-based Emotion Annotation Scheme and Real-life Speech Database , 2012 .

[21]  Carlos Busso,et al.  Using Agreement on Direction of Change to Build Rank-Based Emotion Classifiers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  K. Kroschel,et al.  Emotion Estimation in Speech Using a 3D Emotion Space Concept , 2007 .

[23]  Fabien Ringeval,et al.  AV+EC 2015: The First Affect Recognition Challenge Bridging Across Audio, Video, and Physiological Data , 2015, AVEC@ACM Multimedia.

[24]  Hans W. Guesgen,et al.  Computational Analysis of Emotion Dynamics , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[25]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[26]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[27]  P. Kuppens,et al.  Emotional Inertia and Psychological Maladjustment , 2010, Psychological science.

[28]  J. Gross The Emerging Field of Emotion Regulation: An Integrative Review , 1998 .

[29]  Björn W. Schuller,et al.  Categorical and dimensional affect analysis in continuous input: Current trends and future directions , 2013, Image Vis. Comput..

[30]  Chung-Hsien Wu,et al.  Exploiting Psychological Factors for Interaction Style Recognition in Spoken Conversation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[31]  Roddy Cowie,et al.  Gtrace: General Trace Program Compatible with EmotionML , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[32]  Ursula Hess,et al.  The role of emotion transition for the perception of social dominance and affiliation , 2016, Cognition & emotion.

[33]  R. Davidson Comment: Affective Chronometry Has Come of Age , 2015 .

[34]  R. Davidson Affective Style and Affective Disorders: Perspectives from Affective Neuroscience , 1998 .

[35]  Ting Dang,et al.  An Investigation of Annotation Delay Compensation and Output-Associative Fusion for Multimodal Continuous Emotion Prediction , 2015, AVEC@ACM Multimedia.

[36]  Peter Kuppens,et al.  It’s About Time: A Special Section on Affect Dynamics , 2015 .

[37]  B. Mesquita,et al.  Emotions in Context: A Sociodynamic Model of Emotions , 2014 .

[38]  Emily Mower Provost,et al.  Say Cheese vs. Smile: Reducing Speech-Related Variability for Facial Emotion Recognition , 2014, ACM Multimedia.

[39]  Jean-Philippe Thiran,et al.  Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data , 2015, Pattern Recognit. Lett..

[40]  Zhaocheng Huang,et al.  An Investigation of Emotion Dynamics and Kalman Filtering for Speech-Based Emotion Prediction , 2017, INTERSPEECH.

[41]  Georgios N. Yannakakis,et al.  Grounding truth via ordinal annotation , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[42]  Angeliki Metallinou,et al.  Annotation and processing of continuous emotional attributes: Challenges and opportunities , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[43]  Eliathamby Ambikairajah,et al.  An investigation of emotion change detection from speech , 2015, INTERSPEECH.

[44]  Shrikanth S. Narayanan,et al.  Support Vector Regression for Automatic Recognition of Spontaneous Emotions in Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[45]  P. Kuppens,et al.  Looking at Emotion Regulation Through the Window of Emotion Dynamics , 2015 .

[46]  K. Scherer What are emotions? And how can they be measured? , 2005 .

[47]  Roddy Cowie,et al.  Tracing Emotion: An Overview , 2012, Int. J. Synth. Emot..

[48]  Judith Redi,et al.  Predicting Mood from Punctual Emotion Annotations on Videos , 2015, IEEE Transactions on Affective Computing.

[49]  K. Scherer,et al.  The World of Emotions is not Two-Dimensional , 2007, Psychological science.

[50]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[51]  Zhaocheng Huang,et al.  Detecting the instant of emotion change from speech using a martingale framework , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  Zengfu Wang,et al.  An Emotion Space Model for Recognition of Emotions in Spoken Chinese , 2005, ACII.

[53]  Eva Hudlicka,et al.  What Are We Modeling When We Model Emotion? , 2008, AAAI Spring Symposium: Emotion, Personality, and Social Behavior.

[54]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[55]  Margaret McRorie,et al.  The Belfast Induced Natural Emotion Database , 2012, IEEE Transactions on Affective Computing.

[56]  Emily Mower Provost,et al.  Emotion spotting: discovering regions of evidence in audio-visual emotion expressions , 2016, ICMI.

[57]  Georgios N. Yannakakis,et al.  Don’t Classify Ratings of Affect; Rank Them! , 2014, IEEE Transactions on Affective Computing.

[58]  Björn W. Schuller,et al.  Preserving actual dynamic trend of emotion in dimensional speech emotion recognition , 2012, ICMI '12.

[59]  Yi-Hsuan Yang,et al.  Ranking-Based Emotion Recognition for Music Organization and Retrieval , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[60]  Nicholas B. Allen,et al.  Introducing Emotions to the Modelingof Intra- and Inter-Personal Influencesin Parent-Adolescent Conversations , 2013, IEEE Transactions on Affective Computing.

[61]  Vidhyasaharan Sethu,et al.  Speech Based Emotion Recognition , 2015 .

[62]  Graham Clarke,et al.  Real-time detection of emotional changes for inhabited environments , 2004, Comput. Graph..

[63]  Carlos Busso,et al.  Correcting Time-Continuous Emotional Labels by Modeling the Reaction Lag of Evaluators , 2015, IEEE Transactions on Affective Computing.

[64]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[65]  Hatice Gunes,et al.  Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space , 2011, IEEE Transactions on Affective Computing.

[66]  Guillaume Dubuisson Duplessis,et al.  Multimodal data collection of human-robot humorous interactions in the Joker project , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[67]  Elmar Nöth,et al.  "Of all things the measure is man" automatic classification of emotions and inter-labeler consistency [speech-based emotion recognition] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[68]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[69]  Athanasios Katsamanis,et al.  Tracking changes in continuous emotion states using body language and prosodic cues , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[70]  Ingo Siegert,et al.  Recognising Emotional Evolution from Speech , 2015, ERM4CT@ICMI.

[71]  Shrikanth S. Narayanan,et al.  A hierarchical static-dynamic framework for emotion classification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[72]  P. Niedenthal,et al.  When did her smile drop? Facial mimicry and the influences of emotional state on the detection of change in emotional expression , 2001 .

[73]  Björn W. Schuller,et al.  Timing levels in segment-based speech emotion recognition , 2006, INTERSPEECH.

[74]  Hatice Gunes,et al.  Automatic Prediction of Impressions in Time and across Varying Context: Personality, Attractiveness and Likeability , 2017, IEEE Transactions on Affective Computing.

[75]  Sigal G. Barsade,et al.  Understanding emotional transitions: the interpersonal consequences of changing emotions in negotiations. , 2011, Journal of personality and social psychology.

[76]  Carlos Busso,et al.  The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations , 2015, Language Resources and Evaluation.

[77]  Shrikanth S. Narayanan,et al.  Robust Unsupervised Arousal Rating:A Rule-Based Framework withKnowledge-Inspired Vocal Features , 2014, IEEE Transactions on Affective Computing.

[78]  M. Houben,et al.  The relation between short-term emotion dynamics and psychological well-being: A meta-analysis. , 2015, Psychological bulletin.

[79]  Hatice Gunes,et al.  Output-associative RVM regression for dimensional and continuous emotion prediction , 2011, Face and Gesture 2011.

[80]  Sethuraman Panchanathan,et al.  Detection of changes in human affect dimensions using an Adaptive Temporal Topic model , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).