In this paper, we adopt general-sum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zero-sum stochastic games to a broader framework. We design a multiagent Q-learning method under this framework, and prove that it converges to a Nash equilibrium under speci ed conditions. This algorithm is useful for nding the optimal strategy when there exists a unique Nash equilibrium in the game. When there exist multiple Nash equilibria in the game, this algorithm should be combined with other learning techniques to nd optimal strategies.
O. Mangasarian,et al.
Two-person nonzero-sum games and quadratic programming
Optimality and equilibria in stochastic games
Michael L. Littman,et al.
Markov Games as a Framework for Multi-Agent Reinforcement Learning
Martin L. Puterman,et al.
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Wiley Series in Probability and Statistics.
Ariel Rubinstein,et al.
A Course in Game Theory
Andrew W. Moore,et al.
Reinforcement Learning: A Survey
J. Artif. Intell. Res..
Tucker Balch,et al.
Learning Roles: Behavioral Diversity in Robot Teams
Craig Boutilier,et al.
The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems
Csaba Szepesvári,et al.
A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms