Incremental Policy Generation for Finite-Horizon DEC-POMDPs

Solving multiagent planning problems modeled as DEC-POMDPs is an important challenge. These models are often solved by using dynamic programming, but the high resource usage of current approaches results in limited scalability. To improve the efficiency of dynamic programming algorithms, we propose a new backup algorithm that is based on a reachability analysis of the state space. This method, which we call incremental policy generation, can be used to produce an optimal solution for any possible initial state or further scalability can be achieved by making use of a known start state. When incorporated into the optimal dynamic programming algorithm, our experiments show that planning horizon can be increased due to a marked reduction in resource consumption. This approach also fits nicely with approximate dynamic programming algorithms. To demonstrate this, we incorporate it into the state-of-the-art PBIP algorithm and show significant performance gains. The results suggest that the performance of other dynamic programming algorithms for DEC-POMDPs could be similarly improved by integrating the incremental policy generation approach.

[1]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[2]  Shimon Whiteson,et al.  Lossless clustering of histories in decentralized POMDPs , 2009, AAMAS.

[3]  Shlomo Zilberstein,et al.  Policy Iteration for Decentralized Control of Markov Decision Processes , 2009, J. Artif. Intell. Res..

[4]  Shlomo Zilberstein,et al.  Value-based observation compression for DEC-POMDPs , 2008, AAMAS.

[5]  François Charpillet,et al.  Point-based Dynamic Programming for DEC-POMDPs , 2006, AAAI.

[6]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[7]  Shlomo Zilberstein,et al.  Achieving goals in decentralized POMDPs , 2009, AAMAS.

[8]  S. Zilberstein,et al.  Formal Models and Algorithms for Decentralized Control of Multiple Agents Technical Report UM-CS-2005-068 , 2005 .

[9]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[10]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[11]  Brahim Chaib-draa,et al.  Exact Dynamic Programming for Decentralized POMDPs with Lossless Policy Compression , 2008, ICAPS.

[12]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[13]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[14]  Brahim Chaib-draa,et al.  Point-based incremental pruning heuristic for solving finite-horizon DEC-POMDPs , 2009, AAMAS.

[15]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[16]  Shlomo Zilberstein,et al.  Formal models and algorithms for decentralized decision making under uncertainty , 2008, Autonomous Agents and Multi-Agent Systems.

[17]  Shlomo Zilberstein,et al.  Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs , 2007, UAI.