论文信息 - GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning

GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning

A chatbot that converses like a human should be goal-oriented (i.e., be purposeful in conversation), which is beyond language generation. However, existing goal-oriented dialogue systems often heavily rely on cumbersome hand-crafted rules or costly labelled datasets, which limits the applicability. In this paper, we propose Goal-oriented Chatbots (GoChat), a framework for end-to-end training the chatbot to maximize the long-term return from offline multi-turn dialogue datasets. Our framework utilizes hierarchical reinforcement learning (HRL), where the high-level policy determines some sub-goals to guide the conversation towards the final goal, and the low-level policy fulfills the sub-goals by generating the corresponding utterance for response. In our experiments conducted on a real-world dialogue dataset for anti-fraud in financial, our approach outperforms previous methods on both the quality of response generation as well as the success rate of accomplishing the goal.

[1] Natasha Jaques,et al. Hierarchical Reinforcement Learning for Open-Domain Dialog , 2020, AAAI.

[2] Pararth Shah,et al. Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue , 2019, EMNLP.

[3] Joelle Pineau,et al. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[4] Jianfeng Gao,et al. A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6] Diyi Yang,et al. Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[7] Dongyan Zhao,et al. How to Make Context More Useful? An Empirical Study on Context-Aware Neural Conversational Models , 2017, ACL.

[8] Bing Liu,et al. Iterative policy learning in end-to-end trainable task-oriented neural dialog models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[9] Xueqi Cheng,et al. ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation , 2019, ACL.

[10] Qing He,et al. Policy Optimization with Model-based Explorations , 2018, AAAI.

[11] Yann Dauphin,et al. Deal or No Deal? End-to-End Learning of Negotiation Dialogues , 2017, EMNLP.

[12] Fuzhen Zhuang,et al. Policy Gradients for Contextual Recommendations , 2018, WWW.

[13] Wei-Ying Ma,et al. Hierarchical Recurrent Attention Network for Response Generation , 2017, AAAI.

[14] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[15] Qing He,et al. Beyond Polarity: Interpretable Financial Sentiment Analysis with Hierarchical Query-driven Attention , 2018, IJCAI.

[16] Derek Chen,et al. Decoupling Strategy and Generation in Negotiation Dialogues , 2018, EMNLP.

[17] Zhoujun Li,et al. Building Task-Oriented Dialogue Systems for Online Shopping , 2017, AAAI.

[18] Joelle Pineau,et al. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.