论文信息 - Convergence Problems of General-Sum Multiagent Reinforcement Learning

Convergence Problems of General-Sum Multiagent Reinforcement Learning

Stochastic games are a generalization of MDPs to multiple agents, and can be used as a framework for investigating multiagent learning. Hu and Wellman (1998) recently proposed a multiagent Q-learning method for general-sumstochastic games. In addition to describing the algorithm, they provide a proof that the method will converge to a Nash equilibrium for the game under specified conditions. The convergence depends on a lemma stating that the iteration used by this method is a contraction mapping. Unfortunately the proof is incomplete. In this paper we present a counterexample and flaw to the lemma’s proof. We also introduce strengthened assumptions under which the lemma holds, and examine how this affects the classes of games to which the theoretical result can be applied.

Michael H. Bowling | Michael Bowling

[1] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[2] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[3] C. Watkins. Learning from delayed rewards , 1989 .

[4] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[5] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[6] J. Filar,et al. Competitive Markov Decision Processes , 1996 .

[7] H. Kuhn. Classics in Game Theory , 1997 .

[8] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[9] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[10] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[11] M. Cripps. The theory of learning in games. , 1999 .

[12] Michael P. Wellman,et al. Learning in dynamic noncooperative multiagent systems , 1999 .