Hierarchical Interactive Matching Network for Multi-turn Response Selection in Retrieval-Based Chatbots

We study multi-turn response selection in open domain dialogue systems, where the best-matched response is selected according to a conversation context. The widely used sequential matching models match a response candidate with each utterance in the conversation context through a representation-interaction-aggregation framework, but do not pay enough attention to the inter-utterance dependencies at the representation stage and global information guidance at the interaction stage. They may lead to the result that the matching features of utterance-response pairs may be one-sided or even noisy. In this paper, we propose a hierarchical interactive matching network (HIMN) to model both aspects in a unified framework. In HIMN, we model the dependencies between adjacency utterances in the context with multi-level attention mechanism. Then a two-level hierarchical interactive matching is exploited to introduce the global context information to assist in distilling important matching features of each utterance-response pair at the interaction stage. Finally, the two-level matching features are merged through gate mechanism. Empirical results on both Douban Corpus and Ecommerce Corpus show that HIMN can significantly outperform the competitive baseline models for multi-turn response selection.

[1]  Xueqi Cheng,et al.  ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation , 2019, ACL.

[2]  Xueqi Cheng,et al.  Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN , 2016, IJCAI.

[3]  Qun Liu,et al.  Syntax-based Deep Matching of Short Texts , 2015, IJCAI.

[4]  Ying Chen,et al.  Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network , 2018, ACL.

[5]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[6]  Rudolf Kadlec,et al.  Improved Deep Learning Baselines for Ubuntu Corpus Dialogs , 2015, ArXiv.

[7]  Rui Yan,et al.  Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System , 2016, SIGIR.

[8]  Hang Li,et al.  A Deep Architecture for Matching Short Texts , 2013, NIPS.

[9]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10]  E. Schegloff,et al.  Opening up Closings , 1973 .

[11]  Shuohang Wang,et al.  Learning Natural Language Inference with LSTM , 2015, NAACL.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[14]  Zhoujun Li,et al.  Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots , 2016, ArXiv.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Hai Zhao,et al.  Modeling Multi-turn Conversation with Deep Utterance Aggregation , 2018, COLING.

[18]  Dongyan Zhao,et al.  One Time of Interaction May Not Be Enough: Go Deep with an Interaction-over-Interaction Network for Response Selection in Dialogues , 2019, ACL.

[19]  Xuan Liu,et al.  Multi-view Response Selection for Human-Computer Conversation , 2016, EMNLP.

[20]  Lili Yu,et al.  Building a Production Model for Retrieval-Based Chatbots , 2019, Proceedings of the First Workshop on NLP for Conversational AI.

[21]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.