Learning dynamic prices in MultiSeller electronic retail markets with price sensitive customers, stochastic demands, and inventory replenishments

In this paper, we use reinforcement learning (RL) as a tool to study price dynamics in an electronic retail market consisting of two competing sellers, and price sensitive and lead time sensitive customers. Sellers, offering identical products, compete on price to satisfy stochastically arriving demands (customers), and follow standard inventory control and replenishment policies to manage their inventories. In such a generalized setting, RL techniques have not previously been applied. We consider two representative cases: 1) no information case, were none of the sellers has any information about customer queue levels, inventory levels, or prices at the competitors; and 2) partial information case, where every seller has information about the customer queue levels and inventory levels of the competitors. Sellers employ automated pricing agents, or pricebots, which use RL-based pricing algorithms to reset the prices at random intervals based on factors such as number of back orders, inventory levels, and replenishment lead times, with the objective of maximizing discounted cumulative profit. In the no information case, we show that a seller who uses Q-learning outperforms a seller who uses derivative following (DF). In the partial information case, we model the problem as a Markovian game and use actor-critic based RL to learn dynamic prices. We believe our approach to solving these problems is a new and promising way of setting dynamic prices in multiseller environments with stochastic demands, price sensitive customers, and inventory replenishments

[1]  Gunnar T. Thowsen A dynamic, nonstationary inventory problem for a price/quantity setting firm , 1975 .

[2]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[3]  J. Hofbauer,et al.  Uncoupled Dynamics Do Not Lead to Nash Equilibrium , 2003 .

[4]  K. Ravikumar,et al.  Adaptive strategies for price markdown in a multi-unit descending price auction: a comparative study , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[5]  Pattie Maes,et al.  Dynamic pricing strategies under a finite time horizon , 2001, EC '01.

[6]  Edmund H. Durfee,et al.  Automated strategy searches in an electronic goods market: learning and complex price schedules , 1999, EC '99.

[7]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[8]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[9]  Manfred Kochen,et al.  On the economics of information , 1972, J. Am. Soc. Inf. Sci..

[10]  Dale A. Stirling,et al.  Information rules , 2003, SGMD.

[11]  H. Varian A Model of Sales , 1980 .

[12]  B. Venkateshwara Rao,et al.  Special Issue: OR/MS and E-Business: E-Commerce and Operations Research in Airline Planning, Marketing, and Distribution , 2001, Interfaces.

[13]  Vivek S. Borkar,et al.  Reinforcement Learning in Markovian Evolutionary Games , 2002, Adv. Complex Syst..

[14]  Jeffrey O. Kephart,et al.  Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.

[15]  Vivek S. Borkar,et al.  Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..

[16]  Joseph E. Stiglitz,et al.  The Theory of Sales: A Simple Model of Equilibrium Price Dispersion with Identical Agents , 1982 .

[17]  Jeffrey I. McGill,et al.  Revenue Management: Research Overview and Prospects , 1999, Transp. Sci..

[18]  E. Zabel Monopoly and Uncertainty , 1970 .

[19]  A. X. Carvalho,et al.  Dynamic pricing and reinforcement learning , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[20]  Rajarshi Das,et al.  Dynamic Pricing with Limited Competitor Information in a Multi-Agent Economy , 2000, CoopIS.

[21]  Richard D. Lawrence A Machine-Learning Approach to Optimal Bid Pricing , 2003 .

[22]  Gerald Tesauro,et al.  Pricing in Agent Economies Using Neural Networks and Multi-agent Q-Learning , 2001, Sequence Learning.

[23]  Jeffrey K. MacKie-Mason,et al.  Pricing Congestible Network Resources (Invited Paper) , 1995, IEEE J. Sel. Areas Commun..

[24]  Awi Federgruen,et al.  Combined Pricing and Inventory Control Under Uncertainty , 1999, Oper. Res..

[25]  Julie L. Swann,et al.  CR — 99 / 04 / ESL FLEXIBLE PRICING POLICIES : INTRODUCTION AND A SURVEY OF IMPLEMENTATION IN VARIOUS INDUSTRIES , 2002 .

[26]  Amy Hing-Ling Lau,et al.  The Newsboy Problem With Price-Dependent Demand Distribution , 1988 .

[27]  A. Federgruen On N-person stochastic games by denumerable state space , 1978, Advances in Applied Probability.

[28]  J. P. Bailey,et al.  Understanding Digital Markets: Review and Assessment , 2001 .

[29]  E. J. Collins,et al.  Convergent multiple-timescales reinforcement learning algorithms in normal form games , 2003 .

[30]  Manu Sridharan,et al.  Multi-agent Q-learning and regression trees for automated pricing decisions , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[31]  Diatha Krishna Sundar,et al.  Multi-Agent Learning in Dynamic Pricing Games of Service Markets , 2003 .

[32]  Jeffrey O. Kephart,et al.  Pseudo-convergent Q-Learning by Competitive Pricebots , 2000, ICML.

[33]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[34]  Jeffrey O. Kephart,et al.  Strategic pricebot dynamics , 1999, EC '99.

[35]  Hal R. Varian,et al.  Differential Pricing and Efficiency , 1996, First Monday.

[36]  Joseph E. Stiglitz,et al.  Equilibrium in Product Markets with Imperfect Information , 1979 .

[37]  Pinar Keskinocak,et al.  Dynamic pricing in the presence of inventory considerations: research overview, current practices, and future directions , 2003, IEEE Engineering Management Review.

[38]  Axel van Lamsweerde,et al.  Learning machine learning , 1991 .

[39]  R. Preston McAfee,et al.  Equilibrium Price Dispersion with Consumer Inventories , 2002, J. Econ. Theory.

[40]  G. Ryzin,et al.  Optimal dynamic pricing of inventories with stochastic demand over finite horizons , 1994 .

[41]  Pattie Maes,et al.  Learning Curve: A Simulation-Based Approach to Dynamic Pricing , 2003, Electron. Commer. Res..