Learning of coordination: exploiting sparse interactions in multiagent systems

Creating coordinated multiagent policies in environments with uncertainty is a challenging problem, which can be greatly simplified if the coordination needs are known to be limited to specific parts of the state space, as previous work has successfully shown. In this work, we assume that such needs are unknown and we investigate coordination learning in multiagent settings. We contribute a reinforcement learning based algorithm in which independent decision-makers/agents learn both individual policies and when and how to coordinate. We focus on problems in which the interaction between the agents is sparse, exploiting this property to minimize the coupling of the learning processes for the different agents. We introduce a two-layer extension of Q-learning, in which we augment the action space of each agent with a coordination action that uses information from other agents to decide the correct action. Our results show that our agents learn both to act coordinate and to act independently, in the different regions of the space where they need to, and need not to, coordinate, respectively.

[1]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[2]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[3]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4]  Sridhar Mahadevan,et al.  Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[5]  Shobha Venkataraman,et al.  Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[6]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[7]  Nikos A. Vlassis,et al.  Sparse cooperative Q-learning , 2004, ICML.

[8]  David S. Leslie,et al.  Generalised weakened fictitious play , 2006, Games Econ. Behav..

[9]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[10]  Nikos A. Vlassis,et al.  Utile Coordination: Learning Interdependencies Among Cooperative Agents , 2005, CIG.

[11]  D. Aberdeen,et al.  A ( Revised ) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes , 2003 .

[12]  Francisco S. Melo,et al.  Interaction-driven Markov games for decentralized multiagent planning under uncertainty , 2008, AAMAS.

[13]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[14]  Manuela M. Veloso,et al.  Exploiting factored representations for decentralized execution in multiagent teams , 2007, AAMAS '07.

[15]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[16]  Manuela M. Veloso,et al.  Reasoning about joint beliefs for execution-time communication decisions , 2005, AAMAS '05.

[17]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.