论文信息 - A Polynomial-time Nash Equilibrium Algorithm for Repeated Stochastic Games

A Polynomial-time Nash Equilibrium Algorithm for Repeated Stochastic Games

We present a polynomial-time algorithm that always finds an (approximate) Nash equilibrium for repeated two-player stochastic games. The algorithm exploits the folk theorem to derive a strategy profile that forms an equilibrium by buttressing mutually beneficial behavior with threats, where possible. One component of our algorithm efficiently searches for an approximation of the egalitarian point, the fairest pareto-efficient solution. The paper concludes by applying the algorithm to a set of grid games to illustrate typical solutions the algorithm finds. These solutions compare very favorably to those found by competing algorithms, resulting in strategies with higher social welfare, as well as guaranteed computational efficiency.

Michael L. Littman | Enrique Munoz de Cote | M. Littman | E. M. D. Cote

[1] J. Neumann,et al. Theory of Games and Economic Behavior. , 1945 .

[2] J. Nash. THE BARGAINING PROBLEM , 1950, Classics in Game Theory.

[3] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[4] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[7] D. Koller,et al. Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[8] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[9] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[10] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.

[11] Peter Stone,et al. A polynomial-time nash equilibrium algorithm for repeated games , 2003, EC '03.

[12] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[13] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[14] Michael L. Littman,et al. Cyclic Equilibria in Markov Games , 2005, NIPS.

[15] Michael L. Littman,et al. An Efficient Optimal-Equilibrium Algorithm for Two-player Game Trees , 2006, UAI.