Sequential cost-sensitive decision making with reinforcement learning

Recently, there has been increasing interest in the issues of cost-sensitive learning and decision making in a variety of applications of data mining. A number of approaches have been developed that are effective at optimizing cost-sensitive decisions when each decision is considered in isolation. However, the issue of sequential decision making, with the goal of maximizing total benefits accrued over a period of time instead of immediate benefits, has rarely been addressed. In the present paper, we propose a novel approach to sequential decision making based on the reinforcement learning framework. Our approach attempts to learn decision rules that optimize a sequence of cost-sensitive decisions so as to maximize the total benefits accrued over time. We use the domain of targeted' marketing as a testbed for empirical evaluation of the proposed method. We conducted experiments using approximately two years of monthly promotion data derived from the well-known KDD Cup 1998 donation data set. The experimental results show that the proposed method for optimizing total accrued benefits out performs the usual targeted-marketing methodology of optimizing each promotion in isolation. We also analyze the behavior of the targeting rules that were obtained and discuss their appropriateness to the application domain.

[1]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[2]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[3]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[4]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[5]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[6]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[9]  Thomas G. Dietterich,et al.  Efficient Value Function Approximation Using Regression Trees , 1999 .

[10]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[11]  Thomas G. Dietterich,et al.  Bootstrap Methods for the Cost-Sensitive Evaluation of Classifiers , 2000, ICML.

[12]  Peter D. Turney Cost-sensitive learning bibliography , 2000, The Web Conference.

[13]  Edwin P. D. Pednault,et al.  Segmentation-based modeling for advanced targeted marketing , 2001, KDD '01.

[14]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[15]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[16]  Edwin P. D. Pednault,et al.  Segmented Regression Estimators for Massive Data Sets , 2002, SDM.

[17]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.