The JESTKOD database: an affective multimodal database of dyadic interactions

In human-to-human communication, gesture and speech co-exist in time with a tight synchrony, and gestures are often utilized to complement or to emphasize speech. In human–computer interaction systems, natural, affective and believable use of gestures would be a valuable key component in adopting and emphasizing human-centered aspects. However, natural and affective multimodal data, for studying computational models of gesture and speech, is limited. In this study, we introduce the JESTKOD database, which consists of speech and full-body motion capture data recordings in dyadic interaction setting under agreement and disagreement scenarios. Participants of the dyadic interactions are native Turkish speakers and recordings of each participant are rated in dimensional affect space. We present our multimodal data collection and annotation process, as well as our preliminary experimental studies on agreement/disagreement classification of dyadic interactions using body gesture and speech data. The JESTKOD database provides a valuable asset to investigate gesture and speech towards designing more natural and affective human–computer interaction systems.

[1]  Alessandro Vinciarelli,et al.  Canal9: A database of political debates for analysis of social interactions , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[2]  Mari Ostendorf,et al.  Detection Of Agreement vs. Disagreement In Meetings: Training With Unlabeled Data , 2003, NAACL.

[3]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .

[4]  Kostas Karpouzis,et al.  The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data , 2007, ACII.

[5]  Samy Bengio,et al.  Automatic analysis of multimodal group actions in meetings , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  K. Scherer,et al.  Conscious emotional experience emerges as a function of multilevel, appraisal-driven response synchronization , 2008, Consciousness and Cognition.

[7]  D. Heylen,et al.  Issues in Data Labelling , 2011 .

[8]  Michael Neff,et al.  Exploiting Motion Capture for Virtual Human Animation Data Collection and Annotation Visualization , 2010 .

[9]  Carlos Busso,et al.  The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations , 2015, Language Resources and Evaluation.

[10]  Angeliki Metallinou,et al.  Analysis of interaction attitudes using data-driven hand gesture phrases , 2014, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Julia Hirschberg,et al.  Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies , 2004, ACL.

[12]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[13]  Carlos Busso,et al.  MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception , 2017, IEEE Transactions on Affective Computing.

[14]  A. Mehrabian Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in Temperament , 1996 .

[15]  Laura Vincze,et al.  Agreement and its Multimodal Communication in Debates: A Qualitative Analysis , 2010, Cognitive Computation.

[16]  L. V. D. Maaten,et al.  Multimodal Integration of Dynamic Audio–Visual Cues in the Communication of Agreement and Disagreement , 2014 .

[17]  Maja Pantic,et al.  Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition , 2011, Face and Gesture 2011.

[18]  Shrikanth S. Narayanan,et al.  Modeling Dynamics of Expressive Body Gestures In Dyadic Interactions , 2017, IEEE Transactions on Affective Computing.

[19]  PanticMaja,et al.  The SEMAINE Database , 2012 .

[20]  Dirk Heylen,et al.  Bridging the Gap between Social Animal and Unsocial Machine: A Survey of Social Signal Processing , 2012, IEEE Transactions on Affective Computing.

[21]  Kristin Precoda,et al.  Automatic identification of speaker role and agreement/disagreement in broadcast conversation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Athanasios Katsamanis,et al.  Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information , 2013, Image Vis. Comput..

[23]  J. Russell A circumplex model of affect. , 1980 .

[24]  J. Bavelas,et al.  Gestures Specialized for Dialogue , 1995 .

[25]  P. Ekman,et al.  Unmasking the face : a guide to recognizing emotions from facial clues , 1975 .

[26]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[27]  Jean Carletta,et al.  Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus , 2007, Lang. Resour. Evaluation.

[28]  Engin Erzin,et al.  Use of Agreement/Disagreement Classification in Dyadic Interactions for Continuous Emotion Recognition , 2016, INTERSPEECH.

[29]  Maja Pantic,et al.  Spotting agreement and disagreement: A survey of nonverbal audiovisual cues and tools , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[30]  K. Scherer,et al.  Appraisal processes in emotion: Theory, methods, research. , 2001 .

[31]  Björn W. Schuller,et al.  Emotion representation, analysis and synthesis in continuous space: A survey , 2011, Face and Gesture 2011.

[32]  Shrikanth Narayanan,et al.  The USC Creative IT Database: A Multimodal Database of Theatrical Improvisation , 2010 .

[33]  Shrikanth S. Narayanan,et al.  Analysis of emotional effect on speech-body gesture interplay , 2014, INTERSPEECH.

[34]  Fabio Valente,et al.  Automatic detection of conflicts in spoken conversations: Ratings and analysis of broadcast political debates , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).