Which Channel to Ask My Question?: Personalized Customer Service Request Stream Routing Using Deep Reinforcement Learning

Customer services are critical to all companies, as they may directly connect to the brand reputation. Due to a great number of customers, e-commerce companies often employ multiple communication channels to answer customers’ questions, for example, Chatbot and Hotline. On one hand, each channel has limited capacity to respond to customers’ requests; on the other hand, customers have different preferences over these channels. The current production systems are mainly built based on business rules that merely consider the tradeoffs between the resources and customers’ satisfaction. To achieve the optimal tradeoff between the resources and customers’ satisfaction, we propose a new framework based on deep reinforcement learning that directly takes both resources and user model into account. In addition to the framework, we also propose a new deep-reinforcement-learning-based routing method—double dueling deep Q-learning with prioritized experience replay ( $\mathsf{PER-DoDDQN}$ ). We evaluate our proposed framework and method using both synthetic and a real customer service log data from a large financial technology company. We show that our proposed deep-reinforcement-learning-based framework is superior to the existing production system. Moreover, we also show that our proposed $\mathsf{PER-DoDDQN}$ is better than all other deep Q-learning variants in practice, which provides a more optimal routing plan. These observations suggest that our proposed method can seek the tradeoff, where both channel resources and customers’ satisfaction are optimal.

[1]  Jingyu Wang,et al.  Knowledge-Driven Service Offloading Decision for Vehicular Edge Computing: A Deep Reinforcement Learning Approach , 2019, IEEE Transactions on Vehicular Technology.

[2]  Michael L. Littman,et al.  Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.

[3]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[4]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[5]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[6]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[7]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[8]  Junwei Han,et al.  Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Jun Guo,et al.  Short Utterance Based Speech Language Identification in Intelligent Vehicles With Time-Scale Modifications and Deep Bottleneck Features , 2019, IEEE Transactions on Vehicular Technology.

[10]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[11]  Jen-Tzung Chien,et al.  Image-text dual neural network with decision strategy for small-sample image classification , 2019, Neurocomputing.

[12]  J. Friedman Stochastic gradient boosting , 2002 .

[13]  Jiwen Lu,et al.  Attention-Aware Deep Reinforcement Learning for Video Face Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[15]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[16]  Bo Zhang,et al.  A Novel Combined Prediction Scheme Based on CNN and LSTM for Urban PM2.5 Concentration , 2019, IEEE Access.

[17]  Gillian M. Raab,et al.  synthpop: Bespoke Creation of Synthetic Data in R , 2016 .

[18]  Chi Harold Liu,et al.  Experience-driven Networking: A Deep Reinforcement Learning based Approach , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[19]  Gergo Barta,et al.  Forecasting framework for open access time series in energy , 2016, 2016 IEEE International Energy Conference (ENERGYCON).

[20]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[21]  Pieter Abbeel,et al.  Meta-Learning with Temporal Convolutions , 2017, ArXiv.

[22]  Nei Kato,et al.  A Novel Non-Supervised Deep-Learning-Based Network Traffic Control Method for Software Defined Wireless Networks , 2018, IEEE Wireless Communications.

[23]  Xuelong Li,et al.  Detection of Co-salient Objects by Looking Deep and Wide , 2016, International Journal of Computer Vision.

[24]  Sergey Levine,et al.  The Mirage of Action-Dependent Baselines in Reinforcement Learning , 2018, ICML.

[25]  Traian Rebedea,et al.  Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay , 2016, ArXiv.

[26]  Qiang Shen,et al.  Digital Image Steganalysis Based on Visual Attention and Deep Reinforcement Learning , 2019, IEEE Access.

[27]  Guy Pujolle,et al.  A Long Short-Term Memory Recurrent Neural Network Framework for Network Traffic Matrix Prediction , 2017, ArXiv.

[28]  Dahai Zhang,et al.  A Data-Driven Design for Fault Detection of Wind Turbines Using Random Forests and XGboost , 2018, IEEE Access.

[29]  Yuliang Shi,et al.  Electricity Consumption Prediction Using XGBoost Based on Discrete Wavelet Transform , 2017 .

[30]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[31]  Feng Wu,et al.  Background Prior-Based Salient Object Detection via Deep Reconstruction Residual , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[33]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[34]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[35]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[36]  Jürgen Schmidhuber,et al.  Applying LSTM to Time Series Predictable through Time-Window Approaches , 2000, ICANN.

[37]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[38]  Deyu Meng,et al.  Co-Saliency Detection via a Self-Paced Multiple-Instance Learning Framework , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Lin Lin,et al.  Random forests-based extreme learning machine ensemble for multi-regime time series prediction , 2017, Expert Syst. Appl..

[40]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[41]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[42]  Olga Papaemmanouil,et al.  Deep Reinforcement Learning for Join Order Enumeration , 2018, aiDM@SIGMOD.

[43]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[44]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[45]  Zhongwei Si,et al.  Learning Deep Features for DNA Methylation Data Analysis , 2016, IEEE Access.

[46]  Peng Wei,et al.  Prioritized Sequence Experience Replay , 2019, ArXiv.

[47]  Jen-Tzung Chien,et al.  Image-Text Dual Model for Small-Sample Image Classification , 2017, CCCV.

[48]  Junyuan Wang,et al.  A Machine Learning Framework for Resource Allocation Assisted by Cloud Computing , 2017, IEEE Network.