论文信息 - Learning-Rate Adjusting Q-Learning for Prisoner's Dilemma Games

Learning-Rate Adjusting Q-Learning for Prisoner's Dilemma Games

Many multiagent Q-learning algorithms have been proposed to date, and most of them aim to converge to a Nash equilibrium, which is not desirable in games like the Prisoner's Dilemma (PD). In the previous paper, the author proposed the utility-based Q-learning for PD, which used utilities as rewards in order to maintain mutual cooperation once it had occurred. However, since the agent's action depends on the relation of Q-values the agent has, the mutual cooperation can be maintained by adjusting the learning rate of Q-learning. Thus, in this paper, we deal with the learning rate directly and introduce a new Q-learning method called the learning-rate adjusting Q-learning, or LRA-Q.

Koichi Moriyama | K. Moriyama

[1] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[4] Yoav Shoham. How Relevant is Game Theory to Intelligent Agent Technology , 2007 .

[5] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[6] W. Hamilton,et al. The Evolution of Cooperation , 1984 .

[7] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[8] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[9] J. Neumann,et al. Prisoner's Dilemma , 1993 .

[10] K. Moriyama,et al. Utility Based Q-learning to Maintain Cooperation in Prisoner's Dilemma Games , 2007, 2007 IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT'07).

[11] G. Pagnoni,et al. A Neural Basis for Social Cooperation , 2002, Neuron.