Multiagent Q-Learning : Preliminary Study on Dominance between the Nash and Stackelberg Equilibriums

Some game theory approaches to solve multiagent reinforcement learning in self play, i.e. when agents use the same algorithm for choosing action, employ equilibriums, such as the Nash equilibrium, to compute the policies of the agents. These approaches have been applied only on simple examples. In this paper, we present an extended version of Nash Q-Learning using the Stackelberg equilibrium to address a wider range of games than with the Nash Q-Learning. We show that mixing the Nash and Stackelberg equilibriums can lead to better rewards not only in static games but also in stochastic games. Moreover, we apply the algorithm to a real world example, the automated vehicle coordination problem.

[1]  Marwan A. Simaan,et al.  Equilibrium properties of the nash and stackelberg strategies , 1977, Autom..

[2]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[3]  Pravin Varaiya,et al.  Smart cars on smart roads: problems of control , 1991, IEEE Trans. Autom. Control..

[4]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[5]  Pushkin Kachroo,et al.  Simulation study of multiple intelligent vehicle control using stochastic learning automata , 1997 .

[6]  Peter Stone,et al.  Leading Best-Response Strategies in Repeated Games , 2001, International Joint Conference on Artificial Intelligence.

[7]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[8]  Stuart J. Russell,et al.  Reinforcement learning for autonomous vehicles , 2002 .

[9]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[10]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[11]  Yoav Shoham,et al.  New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.

[12]  Jeffrey S. Rosenschein,et al.  Best-response multiagent learning in non-stationary environments , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[13]  Ville Könönen,et al.  Asymmetric multiagent reinforcement learning , 2003, Web Intell. Agent Syst..

[14]  Brahim Chaib-draa,et al.  Collaborative Driving System Using Teamwork for Platoon Formations , 2005, Applications of Agent Technology in Traffic and Transportation.

[15]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.