Reasoning about joint beliefs for execution-time communication decisions

Just as POMDPs have been used to reason explicitly about uncertainty in single-agent systems, there has been recent interest in using multi-agent POMDPs to coordinate teams of agents in the presence of uncertainty. Although multi-agent POMDPs are known to be highly intractable, communication at every time step transforms a multi-agent POMDP into a more tractable single-agent POMDP. In this paper, we present an approach that generates "centralized" policies for multi-agent POMDPs at plan-time by assuming the presence of free communication, and at run-time, handles the problem of limited communication resources by reasoning about the use of communication as needed for effective execution. This approach trades off the need to do some computation at execution-time for the ability to generate policies more tractably at plan-time. In our algorithm, each agent, at run-time, models the distribution of possible joint beliefs. Joint actions are selected over this distribution, ensuring that agents remain synchronized. Communication is used to integrate local observations into the team belief only when those observations would improve team performance. We show, both through a detailed example and with experimental results, that our approach allows for effective decentralized execution while avoiding unnecessary instances of communication.

[1]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[2]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[3]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[5]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[6]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[7]  Craig Boutilier,et al.  Value-directed sampling methods for monitoring POMDPs , 2001, UAI 2001.

[8]  Craig Boutilier,et al.  Value-Directed Sampling Methods for POMDPs , 2001, UAI.

[9]  Milind Tambe,et al.  Team Formation for Reformation in Multiagent Domains Like RoboCupRescue , 2002, RoboCup.

[10]  Victor R. Lesser,et al.  Multi-agent policies: from centralized ones to decentralized ones , 2002, AAMAS '02.

[11]  Stephen F. Smith,et al.  A Distributed Layered Architecture for Mobile Robot Coordination: Application to Space Exploration , 2002 .

[12]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[13]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[14]  Manuela M. Veloso,et al.  A real-time world model for multi-robot teams with high-latency communication , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[15]  Claudia V. Goldman,et al.  Transition-independent decentralized markov decision processes , 2003, AAMAS '03.

[16]  Makoto Yokoo,et al.  Communications for improving policy computation in distributed POMDPs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[17]  Jeff G. Schneider,et al.  Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[18]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[19]  Victor Lesser,et al.  Formal Modeling of Communication Decisions in Cooperative Multi-agent Systems , 2004 .

[20]  Claudia V. Goldman,et al.  Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..