Incorporating Interpersonal Synchronization Features for Automatic Emotion Recognition from Visual and Audio Data during Communication

During social interaction, humans recognize others’ emotions via individual features and interpersonal features. However, most previous automatic emotion recognition techniques only used individual features—they have not tested the importance of interpersonal features. In the present study, we asked whether interpersonal features, especially time-lagged synchronization features, are beneficial to the performance of automatic emotion recognition techniques. We explored this question in the main experiment (speaker-dependent emotion recognition) and supplementary experiment (speaker-independent emotion recognition) by building an individual framework and interpersonal framework in visual, audio, and cross-modality, respectively. Our main experiment results showed that the interpersonal framework outperformed the individual framework in every modality. Our supplementary experiment showed—even for unknown communication pairs—that the interpersonal framework led to a better performance. Therefore, we concluded that interpersonal features are useful to boost the performance of automatic emotion recognition tasks. We hope to raise attention to interpersonal features in this study.

[1]  Purnima Chandrasekar,et al.  Automatic Speech Emotion Recognition: A survey , 2014, 2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA).

[2]  Nadia Magnenat-Thalmann,et al.  Continuous body emotion recognition system during theater performances , 2016, Comput. Animat. Virtual Worlds.

[3]  Chi-Chun Lee,et al.  Deriving Dyad-Level Interaction Representation Using Interlocutors Structural and Expressive Multimodal Behavior Features , 2017, INTERSPEECH.

[4]  Amandeep Kaur,et al.  Face detection techniques: a review , 2018, Artificial Intelligence Review.

[5]  K. Scherer,et al.  Recognition of emotion from vocal cues. , 1986, Archives of general psychiatry.

[6]  Zied Lachiri,et al.  Emotion Classification in Arousal Valence Model using MAHNOB-HCI Database , 2017 .

[7]  Philip J. B. Jackson,et al.  Speaker-dependent audio-visual emotion recognition , 2009, AVSP.

[8]  Min Hu,et al.  Video facial emotion recognition based on local enhanced motion history image and CNN-CTSLSTM networks , 2019, J. Vis. Commun. Image Represent..

[9]  E. Kapetanios,et al.  Assessing the Effectiveness of Automated Emotion Recognition in Adults and Children for Clinical Investigation , 2020, Frontiers in Human Neuroscience.

[10]  Yunhong Wang,et al.  Continuous Emotion Recognition in Videos by Fusing Facial Expression, Head Pose and Eye Gaze , 2019, ICMI.

[11]  Soujanya Poria,et al.  Beneath the Tip of the Iceberg: Current Challenges and New Directions in Sentiment Analysis Research , 2020, ArXiv.

[12]  Erik Cambria,et al.  Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos , 2018, NAACL.

[13]  Yoshua Bengio,et al.  Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.

[14]  Ninad Mehendale,et al.  Facial emotion recognition using convolutional neural networks (FERC) , 2020, SN Applied Sciences.

[15]  David A. Forsyth,et al.  Shape, Contour and Grouping in Computer Vision , 1999, Lecture Notes in Computer Science.

[16]  Nadia Bianchi-Berthouze,et al.  Emotion recognition by two view SVM_2K classifier on dynamic facial expression features , 2011, Face and Gesture 2011.

[17]  Shin Ando,et al.  Deep Over-sampling Framework for Classifying Imbalanced Data , 2017, ECML/PKDD.

[18]  Junmo Kim,et al.  Deep Temporal Appearance-Geometry Network for Facial Expression Recognition , 2015, ArXiv.

[19]  Yousaf Bin Zikria,et al.  Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network , 2020, Sensors.

[20]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[21]  Shanu Sharma,et al.  Review and comparison of face detection algorithms , 2017, 2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Carlos Busso,et al.  Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions , 2009, INTERSPEECH.

[24]  R. Adolphs Recognizing emotion from facial expressions: psychological and neurological mechanisms. , 2002, Behavioral and cognitive neuroscience reviews.

[25]  Kin-Man Lam,et al.  Histogram-based local descriptors for facial expression recognition (FER): A comprehensive study , 2018, J. Vis. Commun. Image Represent..

[26]  C. Kleinke,et al.  Effects of Mutual Gaze and Touch on Attraction, Mood, and Cardiovascular Reactivity , 1993 .

[27]  Alice H. Oh,et al.  K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations , 2020, Scientific Data.

[28]  Björn Schuller,et al.  Synchronization in Interpersonal Speech , 2019, Front. Robot. AI.

[29]  Ahmad Y. Javaid,et al.  Facial Emotion Recognition: A Survey and Real-World User Experiences in Mixed Reality , 2018, Sensors.

[30]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[32]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Francesca Happé,et al.  Children's recognition of emotions from vocal cues. , 2013, The British journal of developmental psychology.

[34]  Haizhou Li,et al.  Multi-modal Attention for Speech Emotion Recognition , 2020, INTERSPEECH.

[35]  Sriparna Saha,et al.  A study on emotion recognition from body gestures using Kinect sensor , 2014, 2014 International Conference on Communication and Signal Processing.

[36]  Wioleta Szwoch,et al.  Emotion recognition and its application in software engineering , 2013, 2013 6th International Conference on Human System Interactions (HSI).

[37]  K. Scherer,et al.  Cues and channels in emotion recognition. , 1986 .

[38]  Katarzyna Stapor,et al.  Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations , 2017, CORES.

[39]  Ying Wah Teh,et al.  Deep learning approaches for speech emotion recognition: state of the art and research challenges , 2021, Multimedia Tools and Applications.

[40]  Chen-hua Shen,et al.  Analysis of detrended time-lagged cross-correlation between two nonstationary time series , 2015 .

[41]  Gang Niu,et al.  Do We Need Zero Training Loss After Achieving Zero Training Error? , 2020, ICML.

[42]  Fabio Babiloni,et al.  An EEG-Based Transfer Learning Method for Cross-Subject Fatigue Mental State Prediction , 2021, Sensors.

[43]  Omaima N. A. Al-Allaf Review of Face Detection Systems Based Artificial Neural Networks Algorithms , 2014, ArXiv.

[44]  Chun Chen,et al.  A survey of human pose estimation: The body parts parsing based methods , 2015, J. Vis. Commun. Image Represent..

[45]  Mohamed Deriche,et al.  Classifiers Combination Techniques: A Comprehensive Review , 2018, IEEE Access.

[46]  W. Tschacher,et al.  Interpersonal synchrony feels good but impedes self-regulation of affect , 2019, Scientific Reports.

[47]  Francisco Herrera,et al.  Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI , 2020, Inf. Fusion.

[48]  Nasser Kehtarnavaz,et al.  Deep Learning-based Human Pose Estimation: A Survey , 2020, ACM Comput. Surv..

[49]  Shun Li,et al.  Emotion recognition using Kinect motion capture data of human gaits , 2016, PeerJ.

[50]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[51]  Erik Cambria,et al.  Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling , 2018, Knowl. Based Syst..

[52]  Rama Chellappa,et al.  FaceNet2ExpNet: Regularizing a Deep Face Recognition Net for Expression Recognition , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[53]  Shu-Ching Chen,et al.  Dynamic Sampling in Convolutional Neural Networks for Imbalanced Data Classification , 2018, 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[54]  Sergio Escalera,et al.  Survey on Emotional Body Gesture Recognition , 2018, IEEE Transactions on Affective Computing.

[55]  Ryohei Nakatsu,et al.  Emotion recognition and its application to computer agents with spontaneous interactive capabilities , 1999, MULTIMEDIA '99.

[56]  Qing Lei,et al.  A Survey on Human Pose Estimation , 2016, Intell. Autom. Soft Comput..

[57]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  J. Merten Facial-Affective Behavior, Mutual Gaze, and Emotional Experience in Dyadic Interactions , 1997 .

[59]  Ling Shao,et al.  Striking the Right Balance With Uncertainty , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Thamer Alhussain,et al.  Speech Emotion Recognition Using Deep Learning Techniques: A Review , 2019, IEEE Access.

[61]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[62]  D Y Liliana,et al.  Emotion recognition from facial expression using deep convolutional neural network , 2019, Journal of Physics: Conference Series.

[63]  Xing Chen,et al.  Human emotion recognition by optimally fusing facial expression and speech feature , 2020, Signal Process. Image Commun..

[64]  S. Planalp,et al.  Communicating Emotion: Social, Moral, and Cultural Processes , 1999 .

[65]  Francesca Odone,et al.  Real-time Automatic Emotion Recognition from Body Gestures , 2014, ArXiv.

[66]  Aurobinda Routray,et al.  Databases, features and classifiers for speech emotion recognition: a review , 2018, International Journal of Speech Technology.

[67]  Loïc Kessous,et al.  Emotion Recognition through Multiple Modalities: Face, Body Gesture, Speech , 2008, Affect and Emotion in Human-Computer Interaction.

[68]  Antonio Camurri,et al.  Technique for automatic emotion recognition by body gesture analysis , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[69]  David Sander,et al.  Tone of Voice and Mind: The Connections between Intonation, Emotion, Cognition, and Consciousness , 2005 .

[70]  J. Russell,et al.  The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology , 2005, Development and Psychopathology.

[71]  U. Eysel,et al.  Neural structures associated with recognition of facial expressions of basic emotions , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[72]  M. Kret,et al.  Connecting minds and sharing emotions through mimicry: A neurocognitive model of emotional contagion , 2017, Neuroscience & Biobehavioral Reviews.

[73]  Rada Mihalcea,et al.  ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection , 2018, EMNLP.

[74]  Shaogang Gong,et al.  Imbalanced Deep Learning by Minority Class Incremental Rectification , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Aydin Akan,et al.  Real Time Emotion Recognition from Facial Expressions Using CNN Architecture , 2019, 2019 Medical Technologies Congress (TIPTEKNO).

[76]  Chi-Chun Lee,et al.  An Interaction-aware Attention Network for Speech Emotion Recognition in Spoken Dialogs , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[77]  Shiro Kumano,et al.  Interpersonal physiological synchrony is associated with first person and third person subjective assessments of excitement during cooperative joint tasks , 2021, Scientific Reports.

[78]  Paul H. Lee,et al.  Resampling Methods Improve the Predictive Power of Modeling in Class-Imbalanced Datasets , 2014, International journal of environmental research and public health.

[79]  Václav Snášel,et al.  Neural Networks for Emotion Recognition Based on Eye Tracking Data , 2015, 2015 IEEE International Conference on Systems, Man, and Cybernetics.

[80]  Kritika Rupauliha,et al.  Multimodal Emotion Recognition in Polish (Student Consortium) , 2020, 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM).

[81]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.