Solving Continuous-Time Transition-Independent DEC-MDP with Temporal Constraints

Despite the impact of DEC-MDPs over the past decade, scal-ing to large problem domains has been dicult to achieve.The scale-up problem is exacerbated in DEC-MDPs withcontinuous states, which are critical in domains involvingtime; the latest algorithm (M-DPFP) does not scale-up be-yond two agents and a handful of unordered tasks per agent.This paper is focused on meeting this challenge in contin-uous resource DEC-MDPs with two predominant contribu-tions. First, it introduces a novel continuous time model formulti-agent planning problems that exploits transition in-dependence in domains with graphical agent dependenciesand temporal constraints. More importantly, it presents anew, iterative, locally optimal algorithm called SPAC thatis a combination of the following key ideas: (1) de ninga novel augmented CT-MDP such that solving this single-agent continuous time MDP provably provides an automaticbest response to neighboring agents’ policies; (2) fast con-volution to eciently generate such augmented MDPs; (3)new enhanced lazy approximation algorithm to solve theseaugmented MDPs; (4) intelligent seeding of initial policiesin the iterative process; (5) exploiting graph structure ofreward dependencies to exploit local interactions for scala-bility. Our experiments show SPAC not only nds solutionssubstantially faster than M-DPFP with comparable quality,but also scales well to large teams of agents.

[1]  Abdel-Illah Mouaddib,et al.  A polynomial algorithm for decentralized Markov decision processes with temporal constraints , 2005, AAMAS '05.

[2]  Claudia V. Goldman,et al.  Transition-independent decentralized markov decision processes , 2003, AAMAS '03.

[3]  Shlomo Zilberstein,et al.  Point-based backup for decentralized POMDPs: complexity and new algorithms , 2010, AAMAS.

[4]  Lihong Li,et al.  Lazy Approximation for Solving Continuous Finite-Horizon MDPs , 2005, AAAI.

[5]  Emmanuel Benazera Solving Decentralized Continuous Markov Decision Problems with Structured Reward , 2007, KI.

[6]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[7]  Feng Wu,et al.  Point-based policy generation for decentralized POMDPs , 2010, AAMAS.

[8]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[9]  Francisco S. Melo,et al.  Interaction-driven Markov games for decentralized multiagent planning under uncertainty , 2008, AAMAS.

[10]  Frederic Py,et al.  A systematic agent framework for situated autonomous systems , 2010, AAMAS.

[11]  J. G. Bellingham,et al.  Guest editorial - autonomous ocean-sampling networks , 2001 .

[12]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[13]  Milind Tambe,et al.  Planning with continuous resources for agent teams , 2009, AAMAS.

[14]  Yifeng Zeng,et al.  Graphical models for interactive POMDPs: representations and solutions , 2009, Autonomous Agents and Multi-Agent Systems.

[15]  Milind Tambe,et al.  On opportunistic techniques for solving decentralized Markov decision processes with temporal constraints , 2007, AAMAS '07.