Asymmetric multiagent reinforcement learning

A novel method for asymmetric multiagent reinforcement learning is introduced in this paper. The method addresses the problem where the information states of the agents involved in the learning task are not equal; some agents (leaders) have information on how their opponents (followers) will select their actions and based on this information leaders encourage followers to select actions that lead to improved payoffs for the leaders. This kind of configuration arises, e.g. in semi-centralized multiagent systems with an external global utility associated to the system. We present a brief literature survey of multiagent reinforcement learning based on Markov games and then construct an asymmetric learning method that utilizes the theory of Markov games. Additionally, we test the proposed method with a simple example application.

[1]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[2]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[3]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[4]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[5]  Kagan Tumer,et al.  An Introduction to Collective Intelligence , 1999, ArXiv.

[6]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[9]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[10]  Jonathan F. Bard,et al.  Practical Bilevel Optimization: Algorithms and Applications , 1998 .

[11]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[12]  Tuomas Sandholm,et al.  Learning Near-Pareto-Optimal Conventions in Polynomial Time , 2003, NIPS.

[13]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[14]  Daniel Kudenko,et al.  Learning to Coordinate Using Commitment Sequences in Cooperative Multi-agent Systems , 2005, Adaptive Agents and Multi-Agent Systems.

[15]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[16]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[17]  Vincent Conitzer,et al.  Complexity Results about Nash Equilibria , 2002, IJCAI.

[18]  Kagan Tumer,et al.  Using Collective Intelligence to Route Internet Traffic , 1998, NIPS.

[19]  Jeffrey O. Kephart,et al.  Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.

[20]  Manuela Veloso,et al.  Scalable Learning in Stochastic Games , 2002 .

[21]  Csaba Szepesvári,et al.  A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[22]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[23]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Kagan Tumer,et al.  Adaptivity in agent-based routing for data networks , 1999, AGENTS '00.

[26]  Tom Lenaerts,et al.  Learning to Reach the Pareto Optimal Nash Equilibrium as a Team , 2002, Australian Joint Conference on Artificial Intelligence.

[27]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[28]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[29]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[30]  Ville Könönen,et al.  Gradient Based Method for Symmetric and Asymmetric Multiagent Reinforcement Learning , 2003, IDEAL.

[31]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[32]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[33]  D. Kudenko,et al.  Improving on the reinforcement learning of coordination in cooperative multi-agent systems , 2002 .

[34]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .