Collaborative multi-agent reinforcement learning based on a novel coordination tree frame with dynamic partition

In the research of team Markov games, computing the coordinate team dynamically and determining the joint action policy are the main problems. To deal with the first problem, a dynamic team partitioning method is proposed based on a novel coordinate tree frame. We build a coordinate tree with coordinate agent subset and define two breaching weights to represent the weights of an agent to corporate with the agent subset. Each agent chooses the agent subset with a minimum cost as the coordinate team based on coordinate tree. The Q-learning based on belief allocation studies multi-agents joint action policy which helps corporative multi-agents joint action policy to converge to the optimum solution. We perform experiments on multiple simulation environments and compare the proposed algorithm with similar ones. Experimental results show that the proposed algorithms are able to dynamically compute the corporative teams and design the optimum joint action policy for corporative teams. We present a cooperation tree-structure by using the subset of cooperation agents as the nodes of a tree.Two kind of weights are defined which describe the cost of an agent collaborating with or without an agent subset respectively.Each agent calculates its collaborative agent subset with a minimal cost based on coordination trees.

[1]  Michael Rovatsos,et al.  Advice taking in multiagent reinforcement learning , 2007, AAMAS '07.

[2]  Jun Li,et al.  A new multi-agent reinforcement learning approach , 2010, The 2010 IEEE International Conference on Information and Automation.

[3]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[4]  Victor R. Lesser,et al.  Multiagent reinforcement learning and self-organization in a network of agents , 2007, AAMAS '07.

[5]  Shi Zhongzhi Dynamic Contract Net Protocol , 2004 .

[6]  Michael H. Bowling,et al.  Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[7]  Victor R. Lesser,et al.  Self-organization for coordinating decentralized reinforcement learning , 2010, AAMAS.

[8]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[9]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[10]  Meritxell Vinyals,et al.  A Survey on Sensor Networks from a Multiagent Perspective , 2011, Comput. J..

[11]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[12]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[13]  Daniel Kudenko,et al.  Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[14]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[15]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[16]  Michael Rovatsos,et al.  Collaborative agent-based learning with limited data exchange , 2009, AAMAS.

[17]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[18]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[19]  Lynne E. Parker,et al.  Distributed Algorithms for Multi-Robot Observation of Multiple Moving Targets , 2002, Auton. Robots.

[20]  Christopher M. Gifford,et al.  Sharing in teams of heterogeneous, collaborative learning agents , 2009 .

[21]  Matthijs T. J. Spaan,et al.  Real World Multi-agent Systems: Information Sharing, Coordination and Planning , 2007, TbiLLC.

[22]  Hao Li,et al.  Dynamic Partition of Collaborative Multiagent Based on Coordination Trees , 2012, IAS.

[23]  Nikos A. Vlassis,et al.  Non-communicative multi-robot coordination in dynamic environments , 2005, Robotics Auton. Syst..

[24]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..