Modeling Local Contexts for Joint Dialogue Act Recognition and Sentiment Classification with Bi-channel Dynamic Convolutions

In this paper, we target improving the joint dialogue act recognition (DAR) and sentiment classification (SC) tasks by fully modeling the local contexts of utterances. First, we employ the dynamic convolution network (DCN) as the utterance encoder to capture the dialogue contexts. Further, we propose a novel context-aware dynamic convolution network (CDCN) to better leverage the local contexts when dynamically generating kernels. We extended our frameworks into bi-channel version (i.e., BDCN and BCDCN) under multi-task learning to achieve the joint DAR and SC. Two channels can learn their own feature representations for DAR and SC, respectively, but with latent interaction. Besides, we suggest enhancing the tasks by employing the DiaBERT language model. Our frameworks1 obtain state-of-the-art performances against all baselines on two benchmark datasets, demonstrating the importance of modeling the local contexts. 1 Intorduction Dialogue act recognition (DAR) aims to detect speaker’s intentions (e.g., question, agreement or statement) in each utterance, which can facilitate dialog systems to produce appropriate responses (Inui et al., 2001). Recent studies have further revealed that simultaneously recognizing the dialog act and detecting the sentiment in dialog can result in better grasping of speaker’s intention (Cerisara et al., 2018; Qin et al., 2020). These two tasks are closely relevant, i.e., they mutually promote each other by being jointly performed. On the one hand, the DAR provides clues for sentiment classification (SC). In return, the sentiment transitions also can benefit dialogue act prediction. Taking the utterances in Table 1 as examples, it is quite common that a same sentiment following previous utterance’s will be expressed once the dialogue act Agreement is assigned. Meanwhile, when the speaker changes the sentiment from Negative to Neutral, the dialogues act tends to transition into Statement. Speaker Utterance Dialogue Act Sentiment A Does anyone ever feel anxious and empty at the same time? Question Negative B All the time. Answer Negative B I feel like I’m losing my mind a little bit. Statement Negative A Relatable. I’m always anxious and if I’m not feeling empty or depressed I’m angry. Also usually dissociating. Aggrement Negative B I needa go smoke. Statement Neutral Table 1: Example utterances from Mastodon dataset for joint dialogue act and sentiment detection. Prior works model the joint DAR and SC as sequence labeling problem, all accomplishing with recurrent-like neural models, e.g., Long Short-Term Memory Network (LSTM) (Chen et al., 2018; Raheja and Tetreault, 2019). However, one crucial drawback in these models is failing to fully incorporate Codes are publicly available at https://github.com/ljynlp/BCDCN. †Equally Contributed. ‡Corresponding author. This work is licensed under a Creative Commons Attribution 4.0 International License. License details: http:// creativecommons.org/licenses/by/4.0/.

[1]  Harshit Kumar,et al.  Dialogue Act Sequence Labeling using Hierarchical encoder with CRF , 2017, AAAI.

[2]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[3]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[5]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[6]  Bo Huang,et al.  A New Method of Region Embedding for Text Classification , 2018, ICLR.

[7]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.

[8]  Donghong Ji,et al.  Latent Emotion Memory for Multi-Label Emotion Classification , 2020, AAAI.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[11]  Matteo Manica,et al.  Guiding attention in Sequence-to-sequence models for Dialogue Act prediction , 2020, AAAI.

[12]  Regina Barzilay,et al.  Molding CNNs for text: non-linear, non-consecutive convolutions , 2015, EMNLP.

[13]  Anton Nijholt,et al.  Dialogue Act Recognition with Bayesian Networks for Dutch Dialogues , 2002, SIGDIAL Workshop.

[14]  Deng Cai,et al.  Dialogue Act Recognition via CRF-Attentive Structured Network , 2017, SIGIR.

[15]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[16]  Joel R. Tetreault,et al.  Dialogue Act Classification with Context-Aware Self-Attention , 2019, NAACL.

[17]  Yafeng Ren,et al.  Implicit Objective Network for Emotion Detection , 2019, NLPCC.

[18]  Yann Dauphin,et al.  Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.

[19]  Rada Mihalcea,et al.  DialogueRNN: An Attentive RNN for Emotion Detection in Conversations , 2018, AAAI.

[20]  Harksoo Kim,et al.  Integrated neural network model for identifying speech acts, predicators, and sentiments of dialogue utterances , 2018, Pattern Recognit. Lett..

[21]  Seung-won Hwang,et al.  Cold-Start Aware User and Product Attention for Sentiment Classification , 2018, ACL.

[22]  Donghong Ji,et al.  Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus , 2020, ACL.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Bipin Indurkhya,et al.  A case-based natural language dialogue system using dialogue act , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[25]  Piroska Lendvai,et al.  Token-based Chunking of Turn-internal Dialogue Act Sequences , 2007, SIGDIAL.

[26]  Shafiq R. Joty,et al.  Dialogue Act Recognition in Synchronous and Asynchronous Conversations , 2013, SIGDIAL Conference.

[27]  Liang Xiao,et al.  Cross-Domain NER using Cross-Domain Language Modeling , 2019, ACL.

[28]  Yue Zhang,et al.  Improving Twitter Sentiment Classification Using Topic-Enriched Multi-Prototype Word Embeddings , 2016, AAAI.

[29]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[30]  Gina-Anne Levow,et al.  Dialog act tagging with support vector machines and hidden Markov models , 2006, INTERSPEECH.

[31]  Yangming Li,et al.  DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act Recognition and Sentiment Classification , 2020, AAAI.

[32]  Christophe Cerisara,et al.  Multi-task dialog act and sentiment recognition on Mastodon , 2018, COLING.

[33]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[34]  Lukasz Kaiser,et al.  Depthwise Separable Convolutions for Neural Machine Translation , 2017, ICLR.

[35]  Phil Blunsom,et al.  Recurrent Convolutional Neural Networks for Discourse Compositionality , 2013, CVSM@ACL.