论文信息 - On opportunistic techniques for solving decentralized Markov decision processes with temporal constraints

On opportunistic techniques for solving decentralized Markov decision processes with temporal constraints

Decentralized Markov Decision Processes (DEC-MDPs) are a popular model of agent-coordination problems in domains with uncertainty and time constraints but very difficult to solve. In this paper, we improve a state-of-the-art heuristic solution method for DEC-MDPs, called OC-DEC-MDP, that has recently been shown to scale up to larger DEC-MDPs. Our heuristic solution method, called Value Function Propagation (VFP), combines two orthogonal improvements of OC-DEC-MDP. First, it speeds up OC-DEC-MDP by an order of magnitude by maintaining and manipulating a value function for each state (as a function of time) rather than a separate value for each pair of sate and time interval. Furthermore, it achieves better solution qualities than OC-DEC-MDP because, as our analytical results show, it does not overestimate the expected total reward like OC-DEC- MDP. We test both improvements independently in a crisis-management domain as well as for other types of domains. Our experimental results demonstrate a significant speedup of VFP over OC-DEC-MDP as well as higher solution qualities in a variety of situations.

Milind Tambe | Janusz Marecki | Milind Tambe | J. Marecki

[1] Michael L. Littman,et al. Exact Solutions to Time-Dependent MDPs , 2000, NIPS.

[2] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[3] Sven Koenig,et al. Risk-Sensitive Planning with One-Switch Utility Functions: Value Iteration , 2005, AAAI.

[4] Makoto Yokoo,et al. Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[5] References , 1971 .

[6] Victor R. Lesser,et al. Decentralized Markov decision processes with event-driven interactions , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[7] Milind Tambe,et al. A Fast Analytical Algorithm for Solving Markov Decision Processes with Real-Valued Resources , 2007, IJCAI.

[8] Jianhui Wu,et al. Coordinated Plan Management Using Multiagent MDPs , 2006, AAAI Spring Symposium: Distributed Plan and Schedule Management.

[9] Makoto Yokoo,et al. Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[10] Lihong Li,et al. Lazy Approximation for Solving Continuous Finite-Horizon MDPs , 2005, AAAI.

[11] Claudia V. Goldman,et al. Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[12] Abdel-Illah Mouaddib,et al. A polynomial algorithm for decentralized Markov decision processes with temporal constraints , 2005, AAMAS '05.

[13] Craig Boutilier,et al. Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[14] Abdel-Illah Mouaddib,et al. An Iterative Algorithm for Solving Constrained Decentralized Markov Decision Processes , 2006, AAAI.

[15] Claudia V. Goldman,et al. Transition-independent decentralized markov decision processes , 2003, AAMAS '03.