Multi-modal Multi-label Emotion Recognition with Heterogeneous Hierarchical Message Passing

As an important research issue in affective computing community, multi-modal emotion recognition has become a hot topic in the last few years. However, almost all existing studies perform multiple binary classification for each emotion with focus on complete time series data. In this paper, we focus on multi-modal emotion recognition in a multilabel scenario. In this scenario, we consider not only the label-to-label dependency, but also the feature-to-label and modality-to-label dependencies. Particularly, we propose a heterogeneous hierarchical message passing network to effectively model above dependencies. Furthermore, we propose a new multi-modal multi-label emotion dataset based on partial time-series content to show predominant generalization of our model. Detailed evaluation demonstrates the effectiveness of our approach.

[1]  Sridha Sridharan,et al.  Attention Driven Fusion for Multi-Modal Emotion Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Dinesh Manocha,et al.  M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues , 2020, AAAI.

[3]  Wei Zhou,et al.  Beyond Statistical Relations: Integrating Knowledge Relations into Style Correlations for Multi-Label Music Style Classification , 2020, WSDM.

[4]  Louis-Philippe Morency,et al.  Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors , 2018, AAAI.

[5]  Yuqian Zhou,et al.  MIMAMO Net: Integrating Micro- and Macro-motion for Video Emotion Recognition , 2019, AAAI.

[6]  Rong Xiang,et al.  Improving Multi-label Emotion Classification by Integrating both General and Domain-specific Knowledge , 2019, EMNLP.

[7]  Lin Xiao,et al.  Label-Specific Document Representation for Multi-Label Text Classification , 2019, EMNLP.

[8]  Erik Cambria,et al.  Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph , 2018, ACL.

[9]  Shuming Ma,et al.  A Deep Reinforced Sequence-to-Set Model for Multi-Label Classification , 2019, ACL.

[10]  Ryo Masumura,et al.  Speech Emotion Recognition Based on Multi-Label Emotion Existence Model , 2019, INTERSPEECH.

[11]  Wenhan Xiong,et al.  Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification , 2019, EMNLP.

[12]  Pushpak Bhattacharyya,et al.  Sentiment and Emotion help Sarcasm? A Multi-task Learning Framework for Multi-Modal Sarcasm, Sentiment and Emotion Analysis , 2020, ACL.

[13]  Lei Huang,et al.  Sentence-level Emotion Classification with Label and Context Dependence , 2015, ACL.

[14]  Lin Xiao,et al.  Hyperbolic Capsule Networks for Multi-Label Classification , 2020, ACL.

[15]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[16]  Yingyu Liang,et al.  Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis , 2019, AAAI.

[17]  Guodong Zhou,et al.  Multi-Modal Language Analysis with Hierarchical Interaction-Level and Selection-Level Attentions , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[18]  Jianfei Yu,et al.  Improving Multi-label Emotion Classification via Sentiment Classification with Dual Attention Transfer Network , 2018, EMNLP.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Guodong Zhou,et al.  Modeling both Context- and Speaker-Sensitive Dependence for Emotion Detection in Multi-speaker Conversations , 2019, IJCAI.

[21]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[22]  Qiaoming Zhu,et al.  Modeling both Intra- and Inter-modal Influence for Real-Time Emotion Detection in Conversations , 2020, ACM Multimedia.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Haifeng Hu,et al.  Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion , 2019, AAAI.

[25]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[26]  Andrew Zisserman,et al.  Emotion Recognition in Speech using Cross-Modal Transfer in the Wild , 2018, ACM Multimedia.

[27]  Sid Ying-Ze Bao,et al.  Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification , 2020, AAAI.

[28]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[29]  Donghong Ji,et al.  Latent Emotion Memory for Multi-Label Emotion Classification , 2020, AAAI.

[30]  Qiaoming Zhu,et al.  Multi-modal Multi-label Emotion Detection with Modality and Label Dependence , 2020, EMNLP.

[31]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[32]  Pushpak Bhattacharyya,et al.  Context-aware Interactive Attention for Multi-modal Sentiment and Emotion Analysis , 2019, EMNLP.

[33]  Xiu-Shen Wei,et al.  Multi-Label Image Recognition With Graph Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Kyomin Jung,et al.  AttnConvnet at SemEval-2018 Task 1: Attention-based Convolutional Neural Networks for Multi-label Emotion Classification , 2018, SemEval@NAACL-HLT.

[35]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[36]  Yijun Wang,et al.  Context-Aware Generation-Based Net For Multi-Label Visual Emotion Recognition , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[37]  Ruslan Salakhutdinov,et al.  Multimodal Transformer for Unaligned Multimodal Language Sequences , 2019, ACL.

[38]  Junhui Li,et al.  Transformer-based Label Set Generation for Multi-modal Multi-label Emotion Detection , 2020, ACM Multimedia.

[39]  Jiebo Luo,et al.  Multilabel machine learning and its application to semantic scene classification , 2003, IS&T/SPIE Electronic Imaging.

[40]  Arshdeep Sekhon,et al.  Neural Message Passing for Multi-Label Classification , 2019, ECML/PKDD.