Multi-modal Multi-label Emotion Detection with Modality and Label Dependence

As an important research issue in the natural language processing community, multi-label emotion detection has been drawing more and more attention in the last few years. However, almost all existing studies focus on one modality (e.g., textual modality). In this paper, we focus on multi-label emotion detection in a multi-modal scenario. In this scenario, we need to consider both the dependence among different labels (label dependence) and the dependence between each predicting label and different modalities (modality dependence). Particularly, we propose a multi-modal sequence-to-set approach to effectively model both kinds of dependence in multi-modal multi-label emotion detection. The detailed evaluation demonstrates the effectiveness of our approach.

[1]  Guodong Zhou,et al.  Modeling the Clause-Level Structure to Multimodal Sentiment Analysis via Reinforcement Learning , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[2]  Guodong Zhou,et al.  Adversarial Attention Modeling for Multi-dimensional Emotion Regression , 2019, ACL.

[3]  Wei Wu,et al.  SGM: Sequence Generation Model for Multi-label Classification , 2018, COLING.

[4]  John Kane,et al.  Wavelet Maxima Dispersion for Breathy to Tense Voice Discrimination , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Ivan Marsic,et al.  Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition , 2019, ACM Multimedia.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Guodong Zhou,et al.  Effective Sentiment-relevant Word Selection for Multi-modal Sentiment Analysis in Spoken Language , 2019, ACM Multimedia.

[8]  Erik Cambria,et al.  Memory Fusion Network for Multi-view Sequential Learning , 2018, AAAI.

[9]  Guodong Zhou,et al.  Multi-Modal Language Analysis with Hierarchical Interaction-Level and Selection-Level Attentions , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[10]  Muhammad Abdul-Mageed,et al.  EmoNet: Fine-Grained Emotion Detection with Gated Recurrent Neural Networks , 2017, ACL.

[11]  Mark Liberman,et al.  Speaker identification on the SCOTUS corpus , 2008 .

[12]  Erik Cambria,et al.  Multi-attention Recurrent Network for Human Communication Comprehension , 2018, AAAI.

[13]  John Kane,et al.  COVAREP — A collaborative voice analysis repository for speech technologies , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[15]  Stefan Rank,et al.  Modelling Emotional Trajectories of Individuals in an Online Chat , 2012, MATES.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[18]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[19]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[20]  Erik Cambria,et al.  Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph , 2018, ACL.

[21]  Pushpak Bhattacharyya,et al.  Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis , 2019, NAACL.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Wenhan Xiong,et al.  Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification , 2019, EMNLP.

[24]  Louis-Philippe Morency,et al.  Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors , 2018, AAAI.

[25]  Jiebo Luo,et al.  Multilabel machine learning and its application to semantic scene classification , 2003, IS&T/SPIE Electronic Imaging.

[26]  Cheng Li,et al.  Adapting RNN Sequence Prediction Model to Multi-label Set Prediction , 2019, NAACL.

[27]  Patrick A. Naylor,et al.  Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Shuming Ma,et al.  A Deep Reinforced Sequence-to-Set Model for Multi-Label Classification , 2019, ACL.

[29]  Alexander Gelbukh,et al.  DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation , 2019, EMNLP.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Kyomin Jung,et al.  AttnConvnet at SemEval-2018 Task 1: Attention-based Convolutional Neural Networks for Multi-label Emotion Classification , 2018, SemEval@NAACL-HLT.

[32]  Neel Kant,et al.  Practical Text Classification With Large Pre-Trained Language Models , 2018, ArXiv.

[33]  Rada Mihalcea,et al.  DialogueRNN: An Attentive RNN for Emotion Detection in Conversations , 2018, AAAI.

[34]  Chris Biemann,et al.  Hierarchical Multi-label Classification of Text with Capsule Networks , 2019, ACL.

[35]  Lei Huang,et al.  Sentence-level Emotion Classification with Label and Context Dependence , 2015, ACL.

[36]  Jianfei Yu,et al.  Improving Multi-label Emotion Classification via Sentiment Classification with Dual Attention Transfer Network , 2018, EMNLP.

[37]  Ruslan Salakhutdinov,et al.  Multimodal Transformer for Unaligned Multimodal Language Sequences , 2019, ACL.

[38]  Rong Xiang,et al.  Improving Multi-label Emotion Classification by Integrating both General and Domain-specific Knowledge , 2019, EMNLP.

[39]  Lin Xiao,et al.  Label-Specific Document Representation for Multi-Label Text Classification , 2019, EMNLP.

[40]  Guodong Zhou,et al.  Modeling both Context- and Speaker-Sensitive Dependence for Emotion Detection in Multi-speaker Conversations , 2019, IJCAI.

[41]  Pushpak Bhattacharyya,et al.  Context-aware Interactive Attention for Multi-modal Sentiment and Emotion Analysis , 2019, EMNLP.

[42]  Abeer Alwan,et al.  Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics , 2019, INTERSPEECH.

[43]  Guodong Zhou,et al.  Emotion Detection with Neural Personal Discrimination , 2019, EMNLP.