Modeling both Intra- and Inter-modal Influence for Real-Time Emotion Detection in Conversations

Through much exploration in the past decade, emotion analysis in conversations was mainly conducted in textual scenario. Nowadays, with the popularization of speech and video communication, academia and industry have become gradually aware of the need in multimodal scenario. Therefore, emotion detection in conversations becomes increasingly hot not only in natural language processing (NLP) community but also in multimodal analysis community. Although previous studies normally argue that the emotion of current utterance in a conversation is much influenced by the content of historical utterances, their speakers and emotions, they model the influence derived from the history to the current utterance at the same granularity (Intra-modal influence). Intuitively, the clues of emotion detection may not exist in the history of the same modality as current utterance, but in the history of other modalities (Inter-modal influence). Besides, previous studies normally model the information propagation as the conversation flow. Intuitively, bidirectional modeling of information propagation in conversations provides rich clues for emotion detection. Therefore, this paper proposes a bidirectional dynamic dual influence network for real-time emotion detection in conversations, which can simultaneously model both intra- and inter-modal influence with bidirectional information propagation for current utterance and its historical utterances. Detailed experiments demonstrate that our approach much advances the state-of-the-art.

[1]  Chunyan Miao,et al.  Knowledge-Enriched Transformer for Emotion Detection in Textual Conversations , 2019, EMNLP.

[2]  Pascale Fung,et al.  Real-Time Speech Emotion and Sentiment Recognition for Interactive Dialogue Systems , 2016, EMNLP.

[3]  Guodong Zhou,et al.  Multi-Modal Language Analysis with Hierarchical Interaction-Level and Selection-Level Attentions , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[4]  Zhiyong Wu,et al.  Modeling Emotion Influence Using Attention-based Graph Convolutional Recurrent Network , 2019, ICMI.

[5]  Rada Mihalcea,et al.  MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations , 2018, ACL.

[6]  Michael R. Lyu,et al.  HiGRU: Hierarchical Gated Recurrent Units for Utterance-Level Emotion Recognition , 2019, NAACL.

[7]  Alexander Gelbukh,et al.  DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation , 2019, EMNLP.

[8]  Wei Chen,et al.  Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs , 2018, ACM Multimedia.

[9]  Rosalind W. Picard Affective Computing: From Laughter to IEEE , 2010 .

[10]  Dong Zhang,et al.  Speaker Personality Recognition With Multimodal Explicit Many2many Interactions , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[11]  Björn W. Schuller,et al.  AVEC 2012: the continuous audio/visual emotion challenge , 2012, ICMI '12.

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  Ramón Fernández Astudillo,et al.  Not All Contexts Are Created Equal: Better Word Representations with Variable Attention , 2015, EMNLP.

[14]  Mike Thelwall,et al.  Sentiment Analysis Is a Big Suitcase , 2017, IEEE Intelligent Systems.

[15]  Michael R. Lyu,et al.  Real-Time Emotion Recognition via Attention Gated Hierarchical Memory Network , 2019, AAAI.

[16]  Michael R. Lyu,et al.  PT-CoDE: Pre-trained Context-Dependent Encoder for Utterance-level Emotion Recognition , 2019, ArXiv.

[17]  Dongyan Zhao,et al.  GSN: A Graph-Structured Network for Multi-Party Dialogues , 2019, IJCAI.

[18]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[19]  Xiaoyan Zhu,et al.  Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory , 2017, AAAI.

[20]  Erik Cambria,et al.  Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos , 2018, NAACL.

[21]  Guodong Zhou,et al.  Modeling both Context- and Speaker-Sensitive Dependence for Emotion Detection in Multi-speaker Conversations , 2019, IJCAI.

[22]  Rada Mihalcea,et al.  ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection , 2018, EMNLP.

[23]  Guodong Zhou,et al.  Modeling the Clause-Level Structure to Multimodal Sentiment Analysis via Reinforcement Learning , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[24]  Ivan Marsic,et al.  Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition , 2019, ACM Multimedia.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Shizhe Chen,et al.  Speech Emotion Recognition in Dyadic Dialogues with Attentive Interaction Modeling , 2019, INTERSPEECH.

[27]  Erik Cambria,et al.  Augmenting End-to-End Dialogue Systems With Commonsense Knowledge , 2018, AAAI.

[28]  Ramón López-Cózar,et al.  Enhancement of emotion detection in spoken dialogue systems by combining several information sources , 2011, Speech Commun..

[29]  Guodong Zhou,et al.  Effective Sentiment-relevant Word Selection for Multi-modal Sentiment Analysis in Spoken Language , 2019, ACM Multimedia.

[30]  Eduard Hovy,et al.  Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances , 2019, IEEE Access.

[31]  Ivan Marsic,et al.  Human Conversation Analysis Using Attentive Multimodal Networks with Hierarchical Encoder-Decoder , 2018, ACM Multimedia.

[32]  Rada Mihalcea,et al.  DialogueRNN: An Attentive RNN for Emotion Detection in Conversations , 2018, AAAI.

[33]  Ivan Marsic,et al.  Hybrid Attention based Multimodal Network for Spoken Language Classification , 2018, COLING.

[34]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .