Planning with continuous resources for agent teams

Many problems of multiagent planning under uncertainty require distributed reasoning with continuous resources and resource limits. Decentralized Markov Decision Problems (Dec-MDPs) are well-suited to address such problems, but unfortunately, prior Dec-MDP approaches either discretize resources at the expense of speed and quality guarantees, or avoid discretization only by limiting agents' action choices or interactions (e.g. assumption of transition independence). To address these shortcomings, this paper proposes M-DPFP, a novel algorithm for planning with continuous resources for agent teams, with three key features: (i) it maintains the agent team interaction graph to identify and prune the suboptimal policies and to allow the agents to be transition dependent, (ii) it operates in a continuous space of probability functions to provide the error bound on the solution quality and finally (iii) it focuses the search for policies on the most relevant parts of this search space to allow for a systematic trade-off of solution quality for speed. Our experiments show that M-DPFP finds high quality solutions and exhibits superior performance when compared with a discretization-based approach. We also show that M-DPFP is applicable to solving problems that are beyond the scope of existing approaches.

[1]  Victor R. Lesser,et al.  Decentralized Markov decision processes with event-driven interactions , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[2]  Zhengzhu Feng,et al.  Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.

[3]  Milind Tambe,et al.  Towards Faster Planning with Continuous Resources in Stochastic Domains , 2008, AAAI.

[4]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[5]  Abdel-Illah Mouaddib,et al.  An Iterative Algorithm for Solving Constrained Decentralized Markov Decision Processes , 2006, AAAI.

[6]  Claudia V. Goldman,et al.  Transition-independent decentralized markov decision processes , 2003, AAMAS '03.

[7]  Ronen I. Brafman,et al.  Planning with Continuous Resources in Stochastic Domains , 2005, IJCAI.

[8]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[9]  Jianhui Wu,et al.  Coordinated Plan Management Using Multiagent MDPs , 2006, AAAI Spring Symposium: Distributed Plan and Schedule Management.

[10]  Victor R. Lesser,et al.  Designing a Family of Coordination Algorithms , 1997, ICMAS.

[11]  Milind Tambe,et al.  A Fast Analytical Algorithm for Solving Markov Decision Processes with Real-Valued Resources , 2007, IJCAI.

[12]  References , 1971 .

[13]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[14]  Emmanuel Benazera Solving Decentralized Continuous Markov Decision Problems with Structured Reward , 2007, KI.

[15]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[16]  Lihong Li,et al.  Lazy Approximation for Solving Continuous Finite-Horizon MDPs , 2005, AAAI.

[17]  Makoto Yokoo,et al.  Letting loose a SPIDER on a network of POMDPs: generating quality guaranteed policies , 2007, AAMAS '07.

[18]  Milind Tambe,et al.  On opportunistic techniques for solving decentralized Markov decision processes with temporal constraints , 2007, AAMAS '07.

[19]  Håkan L. S. Younes,et al.  Solving Generalized Semi-Markov Decision Processes Using Continuous Phase-Type Distributions , 2004, AAAI.