论文信息 - MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs

MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs

We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DECPOMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite horizon problems.

[1] Kee-Eung Kim,et al. Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[2] Victor R. Lesser,et al. Communication decisions in multi-agent cooperation: model and experiments , 2001, AGENTS '01.

[3] Fangruo Chen. Decentralized supply chains subject to information delays , 1999 .

[4] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.

[5] Milind Tambe,et al. The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[6] Eric A. Hansen,et al. Anytime Heuristic Search , 2011, J. Artif. Intell. Res..

[7] Shlomo Zilberstein,et al. Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[8] Makoto Yokoo,et al. Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[9] Robert M Thrall,et al. Mathematics of Operations Research. , 1978 .

[10] François Charpillet,et al. A heuristic approach for solving decentralized-POMDP: assessment on the pursuit problem , 2002, SAC '02.

[11] Eitan Altman,et al. Applications of Markov Decision Processes in Communication Networks , 2000 .

[12] Claudia V. Goldman,et al. Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..

[13] Michael L. Littman,et al. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[14] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.

[15] Jeff G. Schneider,et al. Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[16] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[17] M. D. Wilkinson,et al. Management science , 1989, British Dental Journal.

[18] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.

[19] Shlomo Zilberstein,et al. LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[20] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[21] Claudia V. Goldman,et al. Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[22] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[23] Gaurav S. Sukhatme,et al. Task-Allocation and Coordination of Multiple Robots for Planetary Exploration , 2001 .

[24] Richard Washington,et al. Incremental Markov-model planning , 1996, Proceedings Eighth IEEE International Conference on Tools with Artificial Intelligence.