A POMDP Approach to Token-Based Team Coordination

Ecient coordination among large numbers of heterogeneous agents promises to revolutionize the way in which some complex tasks, such as responding to urban disasters can be performed. Token-based approaches have shown to be a novel and promising way for such coordination. However, previous token-based algorithms were built on heuristics and did not explicitly consider utilities related to token movements or changes in team states. In this paper we put forward an algorithm that uses team rewards to improve token routing decisions. The ideal solution of this token movement model is a centralized Markov Decision Process (MDP) with joint activity. Unfortunately, the assumptions underlying this model are not feasible for large team coordination and we have to make several approximations. First, we decentralize the centralized MDP as a set of standard MDPs with independent individual activities. Then this MDP is approximated by a Partially Observable Markov Decision Process (POMDP) because agents in a large team may not know the exact states of their teammates or that of the environment. A logical team organization is imposed to limit the token passing among one agent and its neighbors. Belief states of the POMDP model are eciently estimated using Monte Carlo sampling process.

[1]  Milind Tambe,et al.  A prototype infrastructure for distributed robot-agent-person teams , 2003, AAMAS '03.

[2]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[3]  Stephen F. Smith,et al.  Wasp nests for self-configurable factories , 2001, AGENTS '01.

[4]  Thomas Wagner,et al.  A key-based coordination algorithm for dynamic readiness and repair service coordination , 2003, AAMAS '03.

[5]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[6]  Victor R. Lesser,et al.  Solving distributed constraint optimization problems using cooperative mediation , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[7]  Hiroaki Kitano,et al.  RoboCup Rescue: search and rescue in large-scale disasters as a domain for autonomous agents research , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[8]  Michael P. Wellman,et al.  A market protocol for decentralized task allocation , 1998, Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160).

[9]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[10]  Makoto Yokoo,et al.  Communications for improving policy computation in distributed POMDPs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[11]  Paul Scerri,et al.  Information Sharing in Large Scale Teams , 2004 .

[12]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[13]  Luke Hunsberger,et al.  A combinatorial auction for collaborative planning , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[14]  Makoto Yokoo,et al.  An asynchronous complete method for distributed constraint optimization , 2003, AAMAS '03.

[15]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[16]  Sebastian Thrun,et al.  Particle Filters in Robotics , 2002, UAI.

[17]  Claudia V. Goldman,et al.  Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[18]  Milind Tambe,et al.  Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..

[19]  Alcherio Martinoli,et al.  Efficiency and robustness of threshold-based distributed allocation algorithms in multi-agent systems , 2002, AAMAS '02.

[20]  Paul Scerri,et al.  Comparing Three Approaches to Large-Scale Coordination , 2006 .

[21]  Victor R. Lesser,et al.  Communication decisions in multi-agent cooperation: model and experiments , 2001, AGENTS '01.

[22]  Milind Tambe,et al.  Allocating tasks in extreme teams , 2005, AAMAS '05.

[23]  Thomas Wagner,et al.  COORDINATORS coordination managers for first responders , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[24]  Victor R. Lesser,et al.  Evolution of the GPGP/TÆMS Domain-Independent Coordination Framework , 2002, AAMAS '02.

[25]  Spyros G. Tzafestas,et al.  Improved QMDP Policy for Partially Observable Markov Decision Processes in Large Domains: Embedding Exploration Dynamics , 2004, Intell. Autom. Soft Comput..

[26]  Yang Xu,et al.  An integrated token-based algorithm for scalable coordination , 2005, AAMAS '05.

[27]  Pedro V. Sander,et al.  A scalable, distributed algorithm for efficient task allocation , 2002, AAMAS '02.

[28]  Brahim Chaib-draa,et al.  An online POMDP algorithm for complex multiagent environments , 2005, AAMAS '05.

[29]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.