Letting loose a SPIDER on a network of POMDPs: generating quality guaranteed policies

Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are a popular approach for modeling multi-agent systems acting in uncertain domains. Given the significant complexity of solving distributed POMDPs, particularly as we scale up the numbers of agents, one popular approach has focused on approximate solutions. Though this approach is efficient, the algorithms within this approach do not provide any guarantees on solution quality. A second less popular approach focuses on global optimality, but typical results are available only for two agents, and also at considerable computational cost. This paper overcomes the limitations of both these approaches by providing SPIDER, a novel combination of three key features for policy generation in distributed POMDPs: (i) it exploits agent interaction structure given a network of agents (i.e. allowing easier scale-up to larger number of agents); (ii) it uses a combination of heuristics to speedup policy search; and (iii) it allows quality guaranteed approximations, allowing a systematic tradeoff of solution quality for time. Experimental results show orders of magnitude improvement in performance when compared with previous global optimal algorithms.

[1]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[2]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[3]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[4]  Milind Tambe,et al.  Taking DCOP to the real world: efficient complete solutions for distributed multi-event scheduling , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[5]  Makoto Yokoo,et al.  An asynchronous complete method for distributed constraint optimization , 2003, AAMAS '03.

[6]  Milind Tambe,et al.  Distributed Sensor Networks: A Multiagent Perspective , 2003 .

[7]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[8]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[9]  Jeff G. Schneider,et al.  Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[10]  Boi Faltings,et al.  A Scalable Method for Multiagent Constraint Optimization , 2005, IJCAI.

[11]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[12]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[13]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.