Learning dynamic prices in electronic retail markets with customer segmentation

In this paper, we use reinforcement learning (RL) techniques to determine dynamic prices in an electronic monopolistic retail market. The market that we consider consists of two natural segments of customers, captives and shoppers. Captives are mature, loyal buyers whereas the shoppers are more price sensitive and are attracted by sales promotions and volume discounts. The seller is the learning agent in the system and uses RL to learn from the environment. Under (reasonable) assumptions about the arrival process of customers, inventory replenishment policy, and replenishment lead time distribution, the system becomes a Markov decision process thus enabling the use of a wide spectrum of learning algorithms. In this paper, we use the Q-learning algorithm for RL to arrive at optimal dynamic prices that optimize the seller’s performance metric (either long term discounted profit or long run average profit per unit time). Our model and methodology can also be used to compute optimal reorder quantity and optimal reorder point for the inventory policy followed by the seller and to compute the optimal volume discounts to be offered to the shoppers.

[1]  Edmund H. Durfee,et al.  Automated strategy searches in an electronic goods market: learning and complex price schedules , 1999, EC '99.

[2]  H. Varian A Model of Sales , 1980 .

[3]  B. Venkateshwara Rao,et al.  Special Issue: OR/MS and E-Business: E-Commerce and Operations Research in Airline Planning, Marketing, and Distribution , 2001, Interfaces.

[4]  Joseph E. Stiglitz,et al.  The Theory of Sales: A Simple Model of Equilibrium Price Dispersion with Identical Agents , 1982 .

[5]  A. X. Carvalho,et al.  Dynamic pricing and reinforcement learning , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[6]  Jeffrey I. McGill,et al.  Revenue Management: Research Overview and Prospects , 1999, Transp. Sci..

[7]  J. P. Bailey,et al.  Understanding Digital Markets: Review and Assessment , 2001 .

[8]  Milind S. Ladaniya,et al.  MARKETING AND DISTRIBUTION , 2008 .

[9]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Yadati Narahari,et al.  Dynamic pricing models for electronic business , 2005 .

[12]  Julie L. Swann,et al.  CR — 99 / 04 / ESL FLEXIBLE PRICING POLICIES : INTRODUCTION AND A SURVEY OF IMPLEMENTATION IN VARIOUS INDUSTRIES , 2002 .

[13]  Vivek S. Borkar,et al.  Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..

[14]  Satinder Singh,et al.  Learning to Solve Markovian Decision Processes , 1993 .

[15]  K. Ravikumar,et al.  Adaptive strategies for price markdown in a multi-unit descending price auction: a comparative study , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[16]  Pinar Keskinocak,et al.  Dynamic pricing in the presence of inventory considerations: research overview, current practices, and future directions , 2003, IEEE Engineering Management Review.

[17]  Pattie Maes,et al.  Learning Curve: A Simulation-Based Approach to Dynamic Pricing , 2003, Electron. Commer. Res..

[18]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.