Modeling Long-Range Context for Concurrent Dialogue Acts Recognition

In dialogues, an utterance is a chain of consecutive sentences produced by one speaker which ranges from a short sentence to a thousand-word post. When studying dialogues at the utterance level, it is not uncommon that an utterance would serve multiple functions. For instance, "Thank you. It works great." expresses both gratitude and positive feedback in the same utterance. Multiple dialogue acts (DA) for one utterance breeds complex dependencies across dialogue turns. Therefore, DA recognition challenges a model's predictive power over long utterances and complex DA context. We term this problem Concurrent Dialogue Acts (CDA) recognition. Previous work on DA recognition either assumes one DA per utterance or fails to realize the sequential nature of dialogues. In this paper, we present an adapted Convolutional Recurrent Neural Network (CRNN) which models the interactions between utterances of long-range context. Our model significantly outperforms existing work on CDA recognition on a tech forum dataset.

[1]  Haizhou Li,et al.  Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling for Dialogue Topic Tracking , 2016, ACL.

[2]  Peter Sanders,et al.  Highway Hierarchies Hasten Exact Shortest Path Queries , 2005, ESA.

[3]  Zhiyong Luo,et al.  Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts , 2016, COLING.

[4]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[5]  Yiming Yang,et al.  Deep Learning for Extreme Multi-label Text Classification , 2017, SIGIR.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  W. Bruce Croft,et al.  User Intent Prediction in Information-seeking Conversations , 2019, CHIIR.

[8]  Sandra M. Aluísio,et al.  Sentence Segmentation in Narrative Transcripts from Neuropsychological Tests using Recurrent Convolutional Neural Networks , 2016, EACL.

[9]  Harshit Kumar,et al.  Dialogue-act-driven Conversation Model : An Experimental Study , 2018, COLING.

[10]  Jianping Yin,et al.  Relational recurrent neural networks for polyphonic sound event detection , 2019, Multimedia Tools and Applications.

[11]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[12]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[13]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[14]  Heikki Huttunen,et al.  Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Deng Cai,et al.  Dialogue Act Recognition via CRF-Attentive Structured Network , 2017, SIGIR.

[16]  Harshit Kumar,et al.  Dialogue Act Sequence Labeling using Hierarchical encoder with CRF , 2017, AAAI.

[17]  Mark Sandler,et al.  Convolutional recurrent neural networks for music classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Elizabeth Shriberg,et al.  Switchboard SWBD-DAMSL shallow-discourse-function annotation coders manual , 1997 .

[19]  Phil Blunsom,et al.  Recurrent Convolutional Neural Networks for Discourse Compositionality , 2013, CVSM@ACL.