论文信息 - Application of reinforcement learning in dynamic pricing algorithms

Application of reinforcement learning in dynamic pricing algorithms

This paper is concerned with the dynamic pricing problems of a duopoly case in electronic retail markets. Combined with the concept of performance potential, the simulated annealing Q-learning (SA-Q) and the win-or-learn-fast policy hill climbing algorithm (WoLF-PHC) are used to solve the learning problems of multi-agent systems with either average- or discounted-reward criteria, under the case that only partial information about the opponent is known. The simulation results show that the WoLF-PHC algorithm performs well in adapting environment's change and in deriving better learning values than the SA-Q algorithm.

Wang Jintian | Zhou Lei

[1] H. Varian. A Model of Sales , 1980 .

[2] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[3] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[4] Edmund H. Durfee,et al. Automated strategy searches in an electronic goods market: learning and complex price schedules , 1999, EC '99.

[5] Jeffrey O. Kephart,et al. Strategic pricebot dynamics , 1999, EC '99.

[6] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[7] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[8] Xi-Ren Cao,et al. Semi-Markov decision problems and performance sensitivity analysis , 2003, IEEE Trans. Autom. Control..

[9] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.

[10] Pinar Keskinocak,et al. Dynamic pricing in the presence of inventory considerations: research overview, current practices, and future directions , 2003, IEEE Engineering Management Review.

[11] Yang Liu,et al. A new Q-learning algorithm based on the metropolis criterion , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12] Unified NDP method based on TD(0) learning for both average and discounted Markov decision processes , 2006 .

[13] Venkata L. Raju Chinthalapati,et al. Learning dynamic prices in MultiSeller electronic retail markets with price sensitive customers, stochastic demands, and inventory replenishments , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14] Xi-Ren Cao,et al. Stochastic learning and optimization - A sensitivity-based approach , 2007, Annu. Rev. Control..