Concurrent Reinforcement Learning from Customer Interactions

In this paper, we explore applications in which a company interacts concurrently with many customers. The company has an objective function, such as maximising revenue, customer satisfaction, or customer loyalty, which depends primarily on the sequence of interactions between company and customer. A key aspect of this setting is that interactions with different customers occur in parallel. As a result, it is imperative to learn online from partial interaction sequences, so that information acquired from one customer is efficiently assimilated and applied in subsequent interactions with other customers. We present the first framework for concurrent reinforcement learning, using a variant of temporal-difference learning to learn efficiently from partial interaction sequences. We evaluate our algorithms in two large-scale test-beds for online and email interaction respectively, generated from a database of 300,000 customer records.

[1]  Dimitri Bertsekas,et al.  Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[2]  Tom Archibald,et al.  Parallel dynamic programming , 1992 .

[3]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[4]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[5]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[8]  Nicol N. Schraudolph,et al.  Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[9]  Haixun Wang,et al.  Empirical comparison of various reinforcement learning strategies for sequential targeted marketing , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[10]  Naoki Abe,et al.  Sequential cost-sensitive decision making with reinforcement learning , 2002, KDD.

[11]  Naoki Abe,et al.  Cross channel optimized marketing by reinforcement learning , 2004, KDD.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[14]  Daniel Kudenko,et al.  Parallel reinforcement learning with linear function approximation , 2007, AAMAS '07.

[15]  José David Martín-Guerrero,et al.  Assigning discounts in a marketing campaign by using reinforcement learning and neural networks , 2009, Expert Syst. Appl..

[16]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[17]  Konstantinos Tsiptsis,et al.  Data Mining Techniques in CRM: Inside Customer Segmentation , 2010 .

[18]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.