Who Is Speaking to Whom? Learning to Identify Utterance Addressee in Multi-Party Conversations

Previous research on dialogue systems generally focuses on the conversation between two participants, yet multi-party conversations which involve more than two participants within one session bring up a more complicated but realistic scenario. In real multi- party conversations, we can observe who is speaking, but the addressee information is not always explicit. In this paper, we aim to tackle the challenge of identifying all the miss- ing addressees in a conversation session. To this end, we introduce a novel who-to-whom (W2W) model which models users and utterances in the session jointly in an interactive way. We conduct experiments on the benchmark Ubuntu Multi-Party Conversation Corpus and the experimental results demonstrate that our model outperforms baselines with consistent improvements.

[1]  Rui Yan,et al.  Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System , 2016, SIGIR.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Mari Ostendorf,et al.  LSTM based Conversation Models , 2016, ArXiv.

[4]  Yuta Tsuboi,et al.  Addressee and Response Selection for Multi-Party Conversation , 2016, EMNLP.

[5]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[6]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[7]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[8]  David W. Aha,et al.  The Ubuntu Chat Corpus for Multiparticipant Chat Analysis , 2013, AAAI Spring Symposium: Analyzing Microtext.

[9]  Rui Zhang,et al.  Addressee and Response Selection in Multi-Party Conversations with Speaker Interaction RNNs , 2017, AAAI.

[10]  Dongyan Zhao,et al.  Joint Learning of Response Ranking and Next Utterance Suggestion in Human-Computer Conversation System , 2017, SIGIR.

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  Ta-Chung Chi,et al.  Dynamic time-aware attention to speaker roles and contexts for spoken language understanding , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[13]  Jianfeng Gao,et al.  A Persona-Based Neural Conversation Model , 2016, ACL.

[14]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[15]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16]  Dongyan Zhao,et al.  One Time of Interaction May Not Be Enough: Go Deep with an Interaction-over-Interaction Network for Response Selection in Dialogues , 2019, ACL.

[17]  Xuan Liu,et al.  Multi-view Response Selection for Human-Computer Conversation , 2016, EMNLP.

[18]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[19]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[20]  Dongyan Zhao,et al.  Multi-Representation Fusion Network for Multi-Turn Response Selection in Retrieval-Based Chatbots , 2019, WSDM.

[21]  Jiajun Zhang,et al.  Different Contexts Lead to Different Word Embeddings , 2016, COLING.

[22]  Dongyan Zhao,et al.  GSN: A Graph-Structured Network for Multi-Party Dialogues , 2019, IJCAI.

[23]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[24]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[25]  Zhi Jin,et al.  Towards Neural Speaker Modeling in Multi-Party Conversation: The Task, Dataset, and Models , 2017, LREC.

[26]  Ta-Chung Chi,et al.  Speaker Role Contextual Modeling for Language Understanding and Dialogue Policy Learning , 2017, IJCNLP.