Application of reinforcement learning in dynamic pricing algorithms

This paper is concerned with the dynamic pricing problems of a duopoly case in electronic retail markets. Combined with the concept of performance potential, the simulated annealing Q-learning (SA-Q) and the win-or-learn-fast policy hill climbing algorithm (WoLF-PHC) are used to solve the learning problems of multi-agent systems with either average- or discounted-reward criteria, under the case that only partial information about the opponent is known. The simulation results show that the WoLF-PHC algorithm performs well in adapting environment's change and in deriving better learning values than the SA-Q algorithm.

[1]  H. Varian A Model of Sales , 1980 .

[2]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[3]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[4]  Edmund H. Durfee,et al.  Automated strategy searches in an electronic goods market: learning and complex price schedules , 1999, EC '99.

[5]  Jeffrey O. Kephart,et al.  Strategic pricebot dynamics , 1999, EC '99.

[6]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[7]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[8]  Xi-Ren Cao,et al.  Semi-Markov decision problems and performance sensitivity analysis , 2003, IEEE Trans. Autom. Control..

[9]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[10]  Pinar Keskinocak,et al.  Dynamic pricing in the presence of inventory considerations: research overview, current practices, and future directions , 2003, IEEE Engineering Management Review.

[11]  Yang Liu,et al.  A new Q-learning algorithm based on the metropolis criterion , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Unified NDP method based on TD(0) learning for both average and discounted Markov decision processes , 2006 .

[13]  Venkata L. Raju Chinthalapati,et al.  Learning dynamic prices in MultiSeller electronic retail markets with price sensitive customers, stochastic demands, and inventory replenishments , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  Xi-Ren Cao,et al.  Stochastic learning and optimization - A sensitivity-based approach , 2007, Annu. Rev. Control..