Self-evaluated Learning Agent in Multiple State Games

Most of multi-agent reinforcement learning algorithms aim to converge to a Nash equilibrium, but a Nash equilibrium does not necessarily mean a desirable result. On the other hand, there are several methods aiming to depart from unfavorable Nash equilibria, but they are effective only in limited games. Based on them, the authors proposed an agent learning appropriate actions in PD-like and non-PD-like games through self-evaluations in a previous paper [11]. However, the experiments we had conducted were static ones in which there was only one state. The versatility for PD-like and non-PD-like games is indispensable in dynamic environments in which there exist several states transferring one after another in a trial. Therefore, we have conducted new experiments in each of which the agents played a game having multiple states. The experiments include two kinds of game; the one notifies the agents of the current state and the other does not. We report the results in this paper.

[1]  Riichiro Mizoguchi,et al.  PRICAI 2000 Topics in Artificial Intelligence , 2000, Lecture Notes in Computer Science.

[2]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[3]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[4]  G. Hardin,et al.  The Tragedy of the Commons , 1968, Green Planet Blues.

[5]  Yukinori Kakazu,et al.  Co-operative Reinforcement Learning By Payoff Filters (Extended Abstract) , 1995, ECML.

[6]  Chimay J. Anumba,et al.  Negotiation in a multi-agent system for construction claims negotiation , 2002, Appl. Artif. Intell..

[7]  Masayuki Numao,et al.  Constructing an Autonomous Agent with an Interdependent Heuristics , 2000, PRICAI.

[8]  Corso Elvezia Solving a Complex Prisoner's Dilemma with Self-Modifying Policies , 1998 .

[9]  Stefan Wrobel,et al.  Machine Learning: ECML-95 , 1995, Lecture Notes in Computer Science.

[10]  Masayuki Numao,et al.  Construction of a learning agent handling its rewards according to environmental situations , 2002, AAMAS '02.

[11]  Jörgen W. Weibull,et al.  Evolutionary Game Theory , 1996 .

[12]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[13]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[14]  Jean-Arcady Meyer,et al.  Solving a Complex Prisoner's Dilemma with Self-Modifying Policies , 1998 .

[15]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[16]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[17]  G. Pagnoni,et al.  A Neural Basis for Social Cooperation , 2002, Neuron.

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[20]  S. Mikami Cooperative reinforcement learning by Payoff filters , 1995 .

[21]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[22]  Shin Ishii,et al.  Multi-agent reinforcement learning: an approach based on the other agent's internal model , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[23]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[24]  Kagan Tumer,et al.  Collective Intelligence, Data Routing and Braess' Paradox , 2002, J. Artif. Intell. Res..

[25]  Sandip Sen,et al.  Evolving agent socienties that avoid social dilemmas , 2000, GECCO.

[26]  Hiroshi Yokoi,et al.  Self-organized norms of behavior under interactions of selfish agents , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).