论文信息 - Point-based backup for decentralized POMDPs: complexity and new algorithms - 字舞流文

Point-based backup for decentralized POMDPs: complexity and new algorithms

Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem.

Shlomo Zilberstein | Akshat Kumar | S. Zilberstein | Akshat Kumar

[1] Rina Dechter,et al. AND/OR Branch-and-Bound for Graphical Models , 2005, IJCAI.

[2] Rina Dechter,et al. Bucket Elimination: A Unifying Framework for Reasoning , 1999, Artif. Intell..

[3] Jeff G. Schneider,et al. Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[4] John N. Tsitsiklis,et al. On the complexity of decentralized decision making and detection problems , 1985 .

[5] Shlomo Zilberstein,et al. Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs , 2007, UAI.

[6] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[7] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[8] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[9] Brahim Chaib-draa,et al. Point-based incremental pruning heuristic for solving finite-horizon DEC-POMDPs , 2009, AAMAS.

[10] Makoto Yokoo,et al. Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[11] Randy Cogill,et al. An Approximation Algorithm for the Discrete Team Decision Problem , 2006, SIAM J. Control. Optim..

[12] Claudia V. Goldman,et al. Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[13] Shlomo Zilberstein,et al. Incremental Policy Generation for Finite-Horizon DEC-POMDPs , 2009, ICAPS.

[14] Simon de Givry,et al. Existential arc consistency: Getting closer to full arc consistency in weighted CSPs , 2005, IJCAI.

[15] Shlomo Zilberstein,et al. Constraint-based dynamic programming for decentralized POMDPs with structured interactions , 2009, AAMAS.

[16] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[17] Shlomo Zilberstein,et al. Value-based observation compression for DEC-POMDPs , 2008, AAMAS.

[18] Shlomo Zilberstein,et al. Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[19] Makoto Yokoo,et al. Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.