Annotation and processing of continuous emotional attributes: Challenges and opportunities

Human emotional and cognitive states evolve with variable intensity and clarity through the course of social interactions and experiences, and they are continuously influenced by a variety of input multimodal information from the environment and the interaction participants. This has motivated the development of a new area within affective computing that treats emotions as continuous variables and examines their representation, annotation and modeling. In this work, we use as a starting point the continuous emotional annotation that we performed on a large, multimodal database, and discuss annotation challenges, design decisions, annotation results and lessons learned from this effort, in the context of existing literature. Additionally, we discuss a variety of open questions for future research in terms of labeling, combining and processing continuous assessments of emotional and cognitive states.

[1]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[2]  Margaret McRorie,et al.  Cross-Cultural Patterns in Dynamic Ratings of Positive and Negative Natural Emotional Behaviour , 2011, PloS one.

[3]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[4]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .

[5]  Andrea Kleinsmith,et al.  Recognizing Affective Dimensions from Body Posture , 2007, ACII.

[6]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[7]  H. Schlosberg Three dimensions of emotion. , 1954, Psychological review.

[8]  J. Cohn,et al.  Infant Smiling Dynamics and Perceived Positive Emotion , 2008, Journal of nonverbal behavior.

[9]  Chris Callison-Burch,et al.  Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.

[10]  J. Russell,et al.  Evidence for a three-factor theory of emotions , 1977 .

[11]  Hatice Gunes,et al.  Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space , 2011, IEEE Transactions on Affective Computing.

[12]  Athanasia Zlatintsi,et al.  A supervised approach to movie emotion tracking , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Panayiotis G. Georgiou,et al.  Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language , 2013, Proceedings of the IEEE.

[14]  A. Hanjalic,et al.  Extracting moods from pictures and sounds: towards truly personalized TV , 2006, IEEE Signal Processing Magazine.

[15]  Yi-Hsuan Yang,et al.  Ranking-Based Emotion Recognition for Music Organization and Retrieval , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Björn W. Schuller,et al.  Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[17]  Athanasios Katsamanis,et al.  Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information , 2013, Image Vis. Comput..

[18]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[19]  Kristian Kroschel,et al.  RECOGNIZING EMOTIONS IN SPONTANEOUS FACIAL EXPRESSIONS , 2006 .

[20]  Eckart Altenmüller,et al.  EMuJoy: Software for continuous measurement of perceived emotions in music , 2007, Behavior research methods.

[21]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[22]  Athanasios Katsamanis,et al.  Tracking changes in continuous emotion states using body language and prosodic cues , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Björn W. Schuller,et al.  Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification , 2012, IEEE Transactions on Affective Computing.

[24]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[25]  Andrea Kleinsmith,et al.  Multi-score Learning for Affect Recognition: The Case of Body Postures , 2011, ACII.

[26]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[27]  Dongrui Wu,et al.  Speech emotion estimation in 3D space , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[28]  Carlos Busso,et al.  The expression and perception of emotions: comparing assessments of self versus others , 2008, INTERSPEECH.

[29]  Roddy Cowie,et al.  Real life emotions in French and English TV video clips: an integrated annotation protocol combining continuous and discrete approaches , 2006, LREC.

[30]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[31]  Alan Hanjalic,et al.  Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[32]  Shrikanth Narayanan,et al.  The USC Creative IT Database: A Multimodal Database of Theatrical Improvisation , 2010 .

[33]  Shrikanth S. Narayanan,et al.  A Globally-Variant Locally-Constant Model for Fusion of Labels from Multiple Diverse Experts without Using Reference Labels , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Björn W. Schuller,et al.  Combining Long Short-Term Memory and Dynamic Bayesian Networks for Incremental Emotion-Sensitive Artificial Listening , 2010, IEEE Journal of Selected Topics in Signal Processing.

[35]  Vladimir Pavlovic,et al.  Dynamic Probabilistic CCA for Analysis of Affective Behaviour , 2012, ECCV.