GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning

A chatbot that converses like a human should be goal-oriented (i.e., be purposeful in conversation), which is beyond language generation. However, existing goal-oriented dialogue systems often heavily rely on cumbersome hand-crafted rules or costly labelled datasets, which limits the applicability. In this paper, we propose Goal-oriented Chatbots (GoChat), a framework for end-to-end training the chatbot to maximize the long-term return from offline multi-turn dialogue datasets. Our framework utilizes hierarchical reinforcement learning (HRL), where the high-level policy determines some sub-goals to guide the conversation towards the final goal, and the low-level policy fulfills the sub-goals by generating the corresponding utterance for response. In our experiments conducted on a real-world dialogue dataset for anti-fraud in financial, our approach outperforms previous methods on both the quality of response generation as well as the success rate of accomplishing the goal.

[1]  Natasha Jaques,et al.  Hierarchical Reinforcement Learning for Open-Domain Dialog , 2020, AAAI.

[2]  Pararth Shah,et al.  Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue , 2019, EMNLP.

[3]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[4]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[7]  Dongyan Zhao,et al.  How to Make Context More Useful? An Empirical Study on Context-Aware Neural Conversational Models , 2017, ACL.

[8]  Bing Liu,et al.  Iterative policy learning in end-to-end trainable task-oriented neural dialog models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[9]  Xueqi Cheng,et al.  ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation , 2019, ACL.

[10]  Qing He,et al.  Policy Optimization with Model-based Explorations , 2018, AAAI.

[11]  Yann Dauphin,et al.  Deal or No Deal? End-to-End Learning of Negotiation Dialogues , 2017, EMNLP.

[12]  Fuzhen Zhuang,et al.  Policy Gradients for Contextual Recommendations , 2018, WWW.

[13]  Wei-Ying Ma,et al.  Hierarchical Recurrent Attention Network for Response Generation , 2017, AAAI.

[14]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[15]  Qing He,et al.  Beyond Polarity: Interpretable Financial Sentiment Analysis with Hierarchical Query-driven Attention , 2018, IJCAI.

[16]  Derek Chen,et al.  Decoupling Strategy and Generation in Negotiation Dialogues , 2018, EMNLP.

[17]  Zhoujun Li,et al.  Building Task-Oriented Dialogue Systems for Online Shopping , 2017, AAAI.

[18]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.