Introducing Communication in Dis-POMDPs with Finite State Machines

Distributed Partially Observable Markov Decision Problems (Dis-POMDPs) are emerging as a popular approach for modeling sequential decision making in teams operating under uncertainty. To achieve coherent behaviors of agents, it is essential to perform appropriate run-time communication. Thus, there have been many works on the run-time communication schemes in Dis-POMDPs. Also, a Finite State Machine (FSM) is a popular representation for describing a local policy that works in a very long or infinite time horizon. In this paper, we examine a run-time communication scheme when the local policy of each agent is represented as an FSM. In this scheme, the meaning of each message is not predefined; it is given implicitly by the interaction between local policies. We propose an iterative-improvement type algorithm that searches for a joint policy where run-time communication incurs some cost. Thus, agents use runtime communication only when doing so is cost-effective. Interestingly, our algorithm can find a joint policy that obtains a better expected reward than a hand-crafted joint policy, and it requires fewer nodes in the local FSM and fewer message types. Furthermore, we experimentally show that our algorithm can obtain a joint policy that consists of sufficiently complex local FSMs within a reasonable amount of time.

[1]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[2]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[3]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[4]  Makoto Yokoo,et al.  Communications for improving policy computation in distributed POMDPs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[5]  J. Sobel,et al.  STRATEGIC INFORMATION TRANSMISSION , 1982 .

[6]  Claudia V. Goldman,et al.  Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[7]  Manuela M. Veloso,et al.  Reasoning about joint beliefs for execution-time communication decisions , 2005, AAMAS '05.

[8]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[9]  Makoto Yokoo,et al.  Not all agents are equal: scaling up distributed POMDPs for agent networks , 2008, AAMAS.

[10]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[11]  François Charpillet,et al.  An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs , 2005, ECML.

[12]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[13]  Munindar P. Singh Formalizing Communication Protocols for Multiagent Systems , 2007, IJCAI.

[14]  Tod S. Levitt,et al.  Uncertainty in artificial intelligence , 1988 .

[15]  Victor R. Lesser,et al.  Agent interaction in distributed POMDPs and its implications on complexity , 2006, AAMAS '06.

[16]  Daniel Brand,et al.  On Communicating Finite-State Machines , 1983, JACM.