论文信息 - Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings

Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings

The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision process (POMDP). Yet, despite the growing importance and applications of decentralized POMDP models in the multiagents arena, few algorithms have been developed for efficiently deriving joint policies for these models. This paper presents a new class of locally optimal algorithms called "Joint Equilibrium-based search for policies (JESP)". We first describe an exhaustive version of JESP and subsequently a novel dynamic programming approach to JESP. Our complexity analysis reveals the potential for exponential speedups due to the dynamic programming approach. These theoretical results are verified via empirical comparisons of the two JESP versions with each other and with a globally optimal brute-force search algorithm. Finally, we prove piece-wise linear and convexity (PWLC) properties, thus taking steps towards developing algorithms for continuous belief states.

Makoto Yokoo | Milind Tambe | Stacy Marsella | David V. Pynadath | Ranjit Nair

[1] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[2] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .

[3] Craig Boutilier,et al. Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[4] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.

[5] Milind Tambe,et al. The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[6] Michael L. Littman,et al. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[7] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[8] Victor R. Lesser,et al. Communication decisions in multi-agent cooperation: model and experiments , 2001, AGENTS '01.

[9] François Charpillet,et al. A heuristic approach for solving decentralized-POMDP: assessment on the pursuit problem , 2002, SAC '02.

[10] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.