论文信息 - Transformer-based Label Set Generation for Multi-modal Multi-label Emotion Detection - 字舞流文

Transformer-based Label Set Generation for Multi-modal Multi-label Emotion Detection

Multi-modal utterance-level emotion detection has been a hot research topic in both multi-modal analysis and natural language processing communities. Different from traditional single-label multi-modal sentiment analysis, typical multi-modal emotion detection is naturally a multi-label problem where an utterance often contains multiple emotions. Existing studies normally focus on multi-modal fusion only and transform multi-label emotion classification into multiple binary classification problem independently. As a result, existing studies largely ignore two kinds of important dependency information: (1) Modality-to-label dependency, where different emotions can be inferred from different modalities, that is, different modalities contribute differently to each potential emotion. (2) Label-to-label dependency, where some emotions are more likely to coexist than those conflicting emotions. To simultaneously model above two kinds of dependency, we propose a unified approach, namely multi-modal emotion set generation network (MESGN) to generate an emotion set for an utterance. Specifically, we first employ a cross-modal transformer encoder to capture cross-modal interactions among different modalities, and a standard transformer encoder to capture temporal information for each modality-specific sequence given previous interactions. Then, we design a transformer-based discriminative decoding module equipped with modality-to-label attention to handle the modality-to-label dependency. In the meanwhile, we employ a reinforced decoding algorithm with self-critic learning to handle the label-to-label dependency. Finally, we validate the proposed MESGN architecture on a word-level aligned and unaligned multi-modal dataset. Detailed experimentation shows that our proposed MESGN architecture can effectively improve the performance of multi-modal multi-label emotion detection.

Junhui Li | Guodong Zhou | Dong Zhang | Xincheng Ju | Guodong Zhou | Dong Zhang | Junhui Li | Xincheng Ju

[1] Guodong Zhou,et al. Emotion Detection with Neural Personal Discrimination , 2019, EMNLP.

[2] Neel Kant,et al. Practical Text Classification With Large Pre-Trained Language Models , 2018, ArXiv.

[3] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Grigorios Tsoumakas,et al. Multi-Label Classification , 2009, Database Technologies: Concepts, Methodologies, Tools, and Applications.

[5] Geoff Holmes,et al. Classifier chains for multi-label classification , 2009, Machine Learning.

[6] Xiu-Shen Wei,et al. Multi-Label Image Recognition With Graph Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Erik Cambria,et al. Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph , 2018, ACL.

[8] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[9] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[10] Guodong Zhou,et al. Modeling both Context- and Speaker-Sensitive Dependence for Emotion Detection in Multi-speaker Conversations , 2019, IJCAI.

[11] Pushpak Bhattacharyya,et al. Context-aware Interactive Attention for Multi-modal Sentiment and Emotion Analysis , 2019, EMNLP.

[12] Louis-Philippe Morency,et al. Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors , 2018, AAAI.

[13] Jiebo Luo,et al. Multilabel machine learning and its application to semantic scene classification , 2003, IS&T/SPIE Electronic Imaging.

[14] Kyomin Jung,et al. AttnConvnet at SemEval-2018 Task 1: Attention-based Convolutional Neural Networks for Multi-label Emotion Classification , 2018, SemEval@NAACL-HLT.

[15] Alexander M. Rush,et al. Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[16] Guodong Zhou,et al. Modeling the Clause-Level Structure to Multimodal Sentiment Analysis via Reinforcement Learning , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[17] Ruslan Salakhutdinov,et al. Multimodal Transformer for Unaligned Multimodal Language Sequences , 2019, ACL.

[18] Rong Xiang,et al. Improving Multi-label Emotion Classification by Integrating both General and Domain-specific Knowledge , 2019, EMNLP.

[19] Ivan Marsic,et al. Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition , 2019, ACM Multimedia.

[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22] Guodong Zhou,et al. Multi-Modal Language Analysis with Hierarchical Interaction-Level and Selection-Level Attentions , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[23] Muhammad Abdul-Mageed,et al. EmoNet: Fine-Grained Emotion Detection with Gated Recurrent Neural Networks , 2017, ACL.

[24] Alexander Gelbukh,et al. DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation , 2019, EMNLP.

[25] Wenhan Xiong,et al. Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification , 2019, EMNLP.

[26] Shuming Ma,et al. A Deep Reinforced Sequence-to-Set Model for Multi-Label Classification , 2019, ACL.

[27] Stefan Rank,et al. Modelling Emotional Trajectories of Individuals in an Online Chat , 2012, MATES.

[28] Wei Wu,et al. SGM: Sequence Generation Model for Multi-label Classification , 2018, COLING.

[29] Erik Cambria,et al. Multi-attention Recurrent Network for Human Communication Comprehension , 2018, AAAI.

[30] Lin Xiao,et al. Label-Specific Document Representation for Multi-Label Text Classification , 2019, EMNLP.

[31] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[32] Lei Huang,et al. Sentence-level Emotion Classification with Label and Context Dependence , 2015, ACL.

[33] Jianfei Yu,et al. Improving Multi-label Emotion Classification via Sentiment Classification with Dual Attention Transfer Network , 2018, EMNLP.

[34] Guodong Zhou,et al. Effective Sentiment-relevant Word Selection for Multi-modal Sentiment Analysis in Spoken Language , 2019, ACM Multimedia.

[35] Erik Cambria,et al. Memory Fusion Network for Multi-view Sequential Learning , 2018, AAAI.

[36] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.