Reinforcement Learning for User Intent Prediction in Customer Service Bots

A customer service bot is now a necessary component of an e-commerce platform. As a core module of the customer service bot, user intent prediction can help predict user questions before they ask. A typical solution is to find top candidate questions that a user will be interested in. Such solution ignores the inter-relationship between questions and often aims to maximize the immediate reward such as clicks, which may not be ideal in practice. Hence, we propose to view the problem as a sequential decision making process to better capture the long-term effects of each recommendation in the list. Intuitively, we formulate the problem as a Markov decision process and consider using reinforcement learning for the problem. With this approach, questions presented to users are both relevant and diverse. Experiments on offline real-world dataset and online system demonstrate the effectiveness of our proposed approach.

[1]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[2]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[3]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[4]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7]  Gang Fu,et al.  Deep & Cross Network for Ad Click Predictions , 2017, ADKDD@KDD.

[8]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[9]  Wei Chu,et al.  AliMe Assist: An Intelligent Assistant for Creating an Innovative E-commerce Experience , 2017, CIKM.

[10]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[11]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[12]  Wei Zeng,et al.  From Greedy Selection to Exploratory Decision-Making: Diverse Ranking with Policy-Value Networks , 2018, SIGIR.

[13]  Liang Zhang,et al.  Deep Reinforcement Learning for List-wise Recommendations , 2017, ArXiv.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Yunming Ye,et al.  DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.

[16]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[17]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[18]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[19]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[20]  Jiafeng Guo,et al.  Reinforcement Learning to Rank with Markov Decision Process , 2017, SIGIR.