A shopbot is a software agent whose goal is to maximize buyer´s satisfaction through automatically gathering the price and quality information of goods as well as the services from on-line sellers. In the response to shopbots´ activities, sellers on the Internet need the agents called pricebots that can help them maximize their own profits. In this paper we adopts Q-learning, one of the model-free reinforcement learning methods as a price-setting algorithm of pricebots. A Q-learned agent increases profitability and eliminates the cyclic price wars when compared with the agents using the myoptimal (myopically optimal) pricing strategy Q-teaming needs to select a sequence of state-action fairs for the convergence of Q-teaming. When the uniform random method in selecting state-action pairs is used, the number of accesses to the Q-tables to obtain the optimal Q-values is quite large. Therefore, it is not appropriate for universal on-line learning in a real world environment. This phenomenon occurs because the uniform random selection reflects the uncertainty of exploitation for the optimal policy. In this paper, we propose a Mixed Nonstationary Policy (MNP), which consists of both the auxiliary Markov process and the original Markov process. MNP tries to keep balance of exploration and exploitation in reinforcement learning. Our experiment results show that the Q-learning agent using MNP converges to the optimal Q-values about 2.6 time faster than the uniform random selection on the average.
[1]
Michael P. Wellman,et al.
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
,
1998,
ICML.
[2]
Manu Sridharan,et al.
Multi-agent Q-learning and regression trees for automated pricing decisions
,
2000,
Proceedings Fourth International Conference on MultiAgent Systems.
[3]
Jeffrey O. Kephart,et al.
Shopbots and Pricebots
,
1999,
IJCAI.
[4]
G. Tesauro,et al.
Foresight-based pricing algorithms in an economy of software agents
,
1998,
ICE '98.
[5]
Ben J. A. Kröse,et al.
Learning from delayed rewards
,
1995,
Robotics Auton. Syst..
[6]
Richard S. Sutton,et al.
Reinforcement Learning: An Introduction
,
1998,
IEEE Trans. Neural Networks.
[7]
Jeffrey O. Kephart,et al.
Pricing in Agent Economies Using Multi-Agent Q-Learning
,
2002,
Autonomous Agents and Multi-Agent Systems.
[8]
Jeffrey O. Kephart,et al.
Strategic pricebot dynamics
,
1999,
EC '99.
[9]
G. Cybenko,et al.
Q-learning: a tutorial and extensions
,
1997
.
[10]
Gerald Tesauro,et al.
Pricing in Agent Economies Using Neural Networks and Multi-agent Q-Learning
,
2001,
Sequence Learning.