An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning

Abstract : Learning behaviors in a multiagent environment are crucial for developing and adapting multiagent systems. Reinforcement learning techniques have addressed this problem for a single agent acting in a stationary environment, which is modeled as a Markov decision process (MDP). But, multiagent environments are inherently non-stationary since the other agents are free to change their behavior as they also learn and adapt. Stochastic games, first studied in the game theory community, are a natural extension of MDPs to include multiple agents. In this paper we contribute a comprehensive presentation of the relevant techniques for solving stochastic games from both the game theory community and reinforcement learning communities. We examine the assumptions and limitations of these algorithms, and identify similarities between these algorithms, single agent reinforcement learners, and basic game theory techniques.

[1]  J. Robinson An Iterative Method of Solving a Game , 1951 .

[2]  L. Shapley Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[3]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[4]  J. Beránek RONALD A. HOWARD “Dynamic Programming and Markov Processes,” , 1961 .

[5]  O. J. Vrieze,et al.  Stochastic Games with Finite State and Action Spaces. , 1988 .

[6]  L. C. Thomas Stochastic Games with Finite State and Action Spaces , 1988 .

[7]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[8]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[9]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[10]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[11]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[12]  Sandip Sen IJCAI-95 Workshop on Adaptation and Learning in Multiagent Systems , 1996 .

[13]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[14]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[15]  Jerzy A. Filar,et al.  Competitive Markov decision processes : with 57 illustrations , 1997 .

[16]  H. Kuhn Classics in Game Theory , 1997 .

[17]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[18]  Manuela M. Veloso,et al.  Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[19]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[20]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[21]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[22]  Edmund H. Durfee,et al.  The moving target function problem in multi-agent learning , 1998, Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160).

[23]  Sandip Sen,et al.  Evolution and learning in multiagent systems , 1998, Int. J. Hum. Comput. Stud..

[24]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[25]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[26]  Michael H. Bowling,et al.  Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.