论文信息 - Recursive Small-Step Multi-Agent A ∗ for Dec-POMDPs

Recursive Small-Step Multi-Agent A ∗ for Dec-POMDPs

We present recursive small-step multi-agent A ∗ (RS-MAA ∗ ), an exact algorithm that optimizes the expected reward in decentralized partially observable Markov decision processes (Dec-POMDPs). RS-MAA ∗ builds on multi-agent A ∗ (MAA ∗ ), an algorithm that finds policies by exploring a search tree, but tackles two major scalability concerns. First, we employ a modified, small-step variant of the search tree that avoids the double exponential outdegree of the classical formulation. Second, we use a tight and recursive heuristic that we compute on-the-fly, thereby avoiding an expensive precom-putation. The resulting algorithm is conceptually simple, yet it shows superior performance on a rich set of standard benchmarks.

Sebastian Junges | N. Jansen | Wietze Koops | Thiago D. Sim˜ao

[1] Olivier Buffet,et al. Optimally Solving Two-Agent Decentralized POMDPs Under One-Sided Information Sharing , 2020, ICML.

[2] Jan Peters,et al. Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement , 2020, Autonomous Agents and Multi-Agent Systems.

[3] Jan Peters,et al. Information Gathering in Decentralized POMDPs by Policy Graph Improvement , 2019, AAMAS.

[4] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[5] Olivier Buffet,et al. Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Optimally Solving Dec-POMDPs as Continuous-State MDPs , 2022 .

[6] Shimon Whiteson,et al. Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs , 2013, J. Artif. Intell. Res..

[7] Frans A. Oliehoek,et al. Scaling Up Optimal Heuristic Search in Dec-POMDPs via Incremental Expansion , 2011, IJCAI.

[8] Tristan Cazenave,et al. Partial Move A* , 2010, 2010 22nd IEEE International Conference on Tools with Artificial Intelligence.

[9] Shlomo Zilberstein,et al. Incremental Policy Generation for Finite-Horizon DEC-POMDPs , 2009, ICAPS.

[10] Shimon Whiteson,et al. Lossless clustering of histories in decentralized POMDPs , 2009, AAMAS.

[11] Shlomo Zilberstein,et al. Achieving goals in decentralized POMDPs , 2009, AAMAS.

[12] Francisco S. Melo,et al. Interaction-driven Markov games for decentralized multiagent planning under uncertainty , 2008, AAMAS.

[13] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[14] Shlomo Zilberstein,et al. Optimizing Memory-Bounded Controllers for Decentralized POMDPs , 2007, UAI.

[15] Shlomo Zilberstein,et al. Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[16] François Charpillet,et al. MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[17] Shlomo Zilberstein,et al. Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[18] David V. Pynadath,et al. Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[19] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[20] Michael Wooldridge,et al. Game theoretic and decision theoretic agents , 2000, The Knowledge Engineering Review.

[21] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[22] Artificial Intelligence A Modern Approach 3rd Edition , 2021 .

[23] Jonathan P. How,et al. Modeling and Planning with Macro-Actions in Decentralized POMDPs , 2019, J. Artif. Intell. Res..

[24] Frans A. Oliehoek,et al. The MADP Toolbox: An Open-Source Library for Planning and Learning in (Multi-)Agent Systems , 2015, AAAI Fall Symposia.

[25] Nikos A. Vlassis,et al. Q-value Heuristics for Approximate Solutions of Dec-POMDPs , 2007, AAAI Spring Symposium: Game Theoretic and Decision Theoretic Agents.

[26] S. Zilberstein,et al. Optimal Fixed-Size Controllers for Decentralized POMDPs , 2006 .

[27] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[28] Akiyoshi Shioura,et al. Mathematics of operations research , 1998 .