Addressee and Response Selection in Multi-Party Conversations with Speaker Interaction RNNs

In this paper, we study the problem of addressee and response selection in multi-party conversations. Understanding multi-party conversations is challenging because of complex speaker interactions: multiple speakers exchange messages with each other, playing different roles (sender, addressee, observer), and these roles vary across turns. To tackle this challenge, we propose the Speaker Interaction Recurrent Neural Network (SI-RNN). Whereas the previous state-of-the-art system updated speaker embeddings only for the sender, SI-RNN uses a novel dialog encoder to update speaker embeddings in a role-sensitive way. Additionally, unlike the previous work that selected the addressee and response separately, SI-RNN selects them jointly by viewing the task as a sequence prediction problem. Experimental results show that SI-RNN significantly improves the accuracy of addressee and response selection, particularly in complex conversations with many speakers and responses to distant messages many turns in the past.

[1]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[2]  Qun Liu,et al.  Syntax-based Deep Matching of Short Texts , 2015, IJCAI.

[3]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[4]  Zhi Jin,et al.  Towards Neural Speaker Modeling in Multi-Party Conversation: The Task, Dataset, and Models , 2017, LREC.

[5]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[7]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[8]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[9]  Hang Li,et al.  An Information Retrieval Approach to Short Text Conversation , 2014, ArXiv.

[10]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[11]  Marilyn A. Walker,et al.  Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.

[12]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[13]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14]  Hang Li,et al.  A Deep Architecture for Matching Short Texts , 2013, NIPS.

[15]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[16]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[17]  Jianfeng Gao,et al.  A Persona-Based Neural Conversation Model , 2016, ACL.

[18]  Matthew Henderson,et al.  Word-Based Dialog State Tracking with Recurrent Neural Networks , 2014, SIGDIAL Conference.

[19]  Yuta Tsuboi,et al.  Addressee and Response Selection for Multi-Party Conversation , 2016, EMNLP.

[20]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[21]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[22]  Rieks op den Akker,et al.  A comparison of addressee detection methods for multiparty conversations , 2009 .

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[25]  Kallirroi Georgila,et al.  Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets , 2008, CL.

[26]  Xuan Liu,et al.  Multi-view Response Selection for Human-Computer Conversation , 2016, EMNLP.

[27]  Antoine Raux,et al.  The Dialog State Tracking Challenge Series: A Review , 2016, Dialogue Discourse.

[28]  Matthew R. Walter,et al.  Coherent Dialogue with Attention-Based Language Models , 2016, AAAI.

[29]  Ta-Chung Chi,et al.  Dynamic time-aware attention to speaker roles and contexts for spoken language understanding , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[30]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[31]  Anton Nijholt,et al.  Addressee Identification in Face-to-Face Meetings , 2006, EACL.

[32]  Ta-Chung Chi,et al.  Speaker Role Contextual Modeling for Language Understanding and Dialogue Policy Learning , 2017, IJCNLP.