论文信息 - Cyclic Equilibria in Markov Games

Cyclic Equilibria in Markov Games

Although variants of value iteration have been proposed for finding Nash or correlated equilibria in general-sum Markov games, these variants have not been shown to be effective in general. In this paper, we demonstrate by construction that existing variants of value iteration cannot find stationary equilibrium policies in arbitrary general-sum Markov games. Instead, we propose an alternative interpretation of the output of value iteration based on a new (non-stationary) equilibrium concept that we call "cyclic equilibria." We prove that value iteration identifies cyclic equilibria in a class of games in which it fails to find stationary equilibria. We also demonstrate empirically that value iteration finds cyclic equilibria in nearly all examples drawn from a random distribution of Markov games.

[1] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[2] R. Bellman. Dynamic programming. , 1957, Science.

[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[5] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[6] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[7] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[8] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[9] Thomas Dyhre Nielsen,et al. Symbolic and Quantitative Approaches to Reasoning with Uncertainty , 2003, Lecture Notes in Computer Science.

[10] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.

[11] Jeffrey O. Kephart,et al. Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.