Empirical comparison of various reinforcement learning strategies for sequential targeted marketing

We empirically evaluate the performance of various reinforcement learning methods in applications to sequential targeted marketing. In particular we propose and evaluate a progression of reinforcement learning methods, ranging from the "direct" or "batch" methods to "indirect" or "simulation based" methods, and those that we call "semidirect" methods that fall between them. We conduct a number of controlled experiments to evaluate the performance of these competing methods. Our results indicate that while the indirect methods can perform better in a situation in which nearly perfect modeling is possible, under the more realistic situations in which the system's modeling parameters have restricted attention, the indirect methods' performance tend to degrade. We also show that semi-direct methods are effective in reducing the amount of computation necessary to attain a given level of performance, and often result in more profitable policies.

[1]  C. Watkins Learning from delayed rewards , 1989 .

[2]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[3]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[4]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[5]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[6]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[7]  Thomas G. Dietterich,et al.  Efficient Value Function Approximation Using Regression Trees , 1999 .

[8]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[9]  Thomas G. Dietterich,et al.  Bootstrap Methods for the Cost-Sensitive Evaluation of Classifiers , 2000, ICML.

[10]  Peter D. Turney Cost-sensitive learning bibliography , 2000, The Web Conference.

[11]  Edwin P. D. Pednault,et al.  Segmentation-based modeling for advanced targeted marketing , 2001, KDD '01.

[12]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[13]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[14]  Edwin P. D. Pednault,et al.  Segmented Regression Estimators for Massive Data Sets , 2002, SDM.

[15]  Naoki Abe,et al.  Sequential cost-sensitive decision making with reinforcement learning , 2002, KDD.

[16]  A. ADoefaa,et al.  ? ? ? ? f ? ? ? ? ? , 2003 .

[17]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.