Recursive Small-Step Multi-Agent A ∗ for Dec-POMDPs

We present recursive small-step multi-agent A ∗ (RS-MAA ∗ ), an exact algorithm that optimizes the expected reward in decentralized partially observable Markov decision processes (Dec-POMDPs). RS-MAA ∗ builds on multi-agent A ∗ (MAA ∗ ), an algorithm that finds policies by exploring a search tree, but tackles two major scalability concerns. First, we employ a modified, small-step variant of the search tree that avoids the double exponential outdegree of the classical formulation. Second, we use a tight and recursive heuristic that we compute on-the-fly, thereby avoiding an expensive precom-putation. The resulting algorithm is conceptually simple, yet it shows superior performance on a rich set of standard benchmarks.

[1]  Olivier Buffet,et al.  Optimally Solving Two-Agent Decentralized POMDPs Under One-Sided Information Sharing , 2020, ICML.

[2]  Jan Peters,et al.  Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement , 2020, Autonomous Agents and Multi-Agent Systems.

[3]  Jan Peters,et al.  Information Gathering in Decentralized POMDPs by Policy Graph Improvement , 2019, AAMAS.

[4]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[5]  Olivier Buffet,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Optimally Solving Dec-POMDPs as Continuous-State MDPs , 2022 .

[6]  Shimon Whiteson,et al.  Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs , 2013, J. Artif. Intell. Res..

[7]  Frans A. Oliehoek,et al.  Scaling Up Optimal Heuristic Search in Dec-POMDPs via Incremental Expansion , 2011, IJCAI.

[8]  Tristan Cazenave,et al.  Partial Move A* , 2010, 2010 22nd IEEE International Conference on Tools with Artificial Intelligence.

[9]  Shlomo Zilberstein,et al.  Incremental Policy Generation for Finite-Horizon DEC-POMDPs , 2009, ICAPS.

[10]  Shimon Whiteson,et al.  Lossless clustering of histories in decentralized POMDPs , 2009, AAMAS.

[11]  Shlomo Zilberstein,et al.  Achieving goals in decentralized POMDPs , 2009, AAMAS.

[12]  Francisco S. Melo,et al.  Interaction-driven Markov games for decentralized multiagent planning under uncertainty , 2008, AAMAS.

[13]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[14]  Shlomo Zilberstein,et al.  Optimizing Memory-Bounded Controllers for Decentralized POMDPs , 2007, UAI.

[15]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[16]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[17]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[18]  David V. Pynadath,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[19]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[20]  Michael Wooldridge,et al.  Game theoretic and decision theoretic agents , 2000, The Knowledge Engineering Review.

[21]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[22]  Artificial Intelligence A Modern Approach 3rd Edition , 2021 .

[23]  Jonathan P. How,et al.  Modeling and Planning with Macro-Actions in Decentralized POMDPs , 2019, J. Artif. Intell. Res..

[24]  Frans A. Oliehoek,et al.  The MADP Toolbox: An Open-Source Library for Planning and Learning in (Multi-)Agent Systems , 2015, AAAI Fall Symposia.

[25]  Nikos A. Vlassis,et al.  Q-value Heuristics for Approximate Solutions of Dec-POMDPs , 2007, AAAI Spring Symposium: Game Theoretic and Decision Theoretic Agents.

[26]  S. Zilberstein,et al.  Optimal Fixed-Size Controllers for Decentralized POMDPs , 2006 .

[27]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[28]  Akiyoshi Shioura,et al.  Mathematics of operations research , 1998 .