Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping

Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXP-Complete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed POMDPs. The primary novelty of TREMOR is that agents plan individually with a single agent POMDP solver and use social model shaping to implicitly coordinate with other agents. Experiments demonstrate that TREMOR can provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.

[1]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[2]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[3]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[4]  Hiroaki Kitano,et al.  RoboCup Rescue: A Grand Challenge for Multiagent and Intelligent Systems , 2001, AI Mag..

[5]  Stephen F. Smith,et al.  A Distributed Layered Architecture for Mobile Robot Coordination: Application to Space Exploration , 2002 .

[6]  Geoffrey J. Gordon,et al.  Distributed Planning in Hierarchical Factored MDPs , 2002, UAI.

[7]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[8]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[9]  Victor R. Lesser,et al.  Decentralized Markov decision processes with event-driven interactions , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[10]  Abdel-Illah Mouaddib,et al.  A polynomial algorithm for decentralized Markov decision processes with temporal constraints , 2005, AAMAS '05.

[11]  Milind Tambe,et al.  Hybrid BDI-POMDP Framework for Multiagent Teaming , 2011, J. Artif. Intell. Res..

[12]  Milind Tambe,et al.  Allocating tasks in extreme teams , 2005, AAMAS '05.

[13]  Milind Tambe,et al.  Exploiting belief bounds: practical POMDPs for personal assistant agents , 2005, AAMAS '05.

[14]  Abdel-Illah Mouaddib,et al.  An Iterative Algorithm for Solving Constrained Decentralized Markov Decision Processes , 2006, AAAI.

[15]  Milind Tambe,et al.  Towards Efficient Computation of Error Bounded Solutions in POMDPs: Expected Value Approximation and Dynamic Disjunctive Beliefs , 2007, IJCAI.

[16]  S. Zilberstein,et al.  Bounded Dynamic Programming for Decentralized POMDPs , 2007 .

[17]  Milind Tambe,et al.  On opportunistic techniques for solving decentralized Markov decision processes with temporal constraints , 2007, AAMAS '07.

[18]  Manuela M. Veloso,et al.  Exploiting factored representations for decentralized execution in multiagent teams , 2007, AAMAS '07.

[19]  Marek Petrik,et al.  Anytime Coordination Using Separable Bilinear Programs , 2007, AAAI.

[20]  Shlomo Zilberstein,et al.  Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs , 2007, UAI.

[21]  Raffaello D'Andrea,et al.  Coordinating Hundreds of Cooperative, Autonomous Vehicles in Warehouses , 2007, AI Mag..

[22]  Michael A. Goodrich,et al.  Towards combining UAV and sensor operator roles in UAV-enabled visual search , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[23]  Makoto Yokoo,et al.  Not all agents are equal: scaling up distributed POMDPs for agent networks , 2008, AAMAS.

[24]  Francisco S. Melo,et al.  Interaction-driven Markov games for decentralized multiagent planning under uncertainty , 2008, AAMAS.

[25]  Shimon Whiteson,et al.  Exploiting locality of interaction in factored Dec-POMDPs , 2008, AAMAS.

[26]  Nicholas R. Jennings,et al.  Reward shaping for valuing communications during multi-agent coordination , 2009, AAMAS.