论文信息 - Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

Multiagent learning is a key problem in AI. In the presence of multiple Nash equilibria, even agents with non-conflicting interests may not be able to learn an optimal coordination policy. The problem is exaccerbated if the agents do not know the game and independently receive noisy payoffs. So, multiagent reinforfcement learning involves two interrelated problems: identifying the game and learning to play. In this paper, we present optimal adaptive learning, the first algorithm that converges to an optimal Nash equilibrium with probability 1 in any team Markov game. We provide a convergence proof, and show that the algorithm's parameters are easy to set to meet the convergence conditions.

Xiaofeng Wang | Tuomas Sandholm | T. Sandholm | Xiaofeng Wang

[1] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[2] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[3] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[4] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[5] Michael L. Littman,et al. Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[6] R. Rob,et al. Learning, Mutation, and Long Run Equilibria in Games , 1993 .

[7] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[8] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[9] Gerhard Weiss,et al. Learning to Coordinate Actions in Multi-Agent-Systems , 1993, IJCAI.

[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11] Craig Boutilier,et al. Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[12] Sandip Sen,et al. Learning to Coordinate without Sharing Information , 1994, AAAI.

[13] Robert H. Crites,et al. Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[14] H. Young,et al. The Evolution of Conventions , 1993 .