论文信息 - Reward shaping for valuing communications during multi-agent coordination

Reward shaping for valuing communications during multi-agent coordination

Decentralised coordination in multi-agent systems is typically achieved using communication. However, in many cases, communication is expensive to utilise because there is limited bandwidth, it may be dangerous to communicate, or communication may simply be unavailable at times. In this context, we argue for a rational approach to communication --- if it has a cost, the agents should be able to calculate a value of communicating. By doing this, the agents can balance the need to communicate with the cost of doing so. In this research, we present a novel model of rational communication, that uses reward shaping to value communications, and employ this valuation in decentralised POMDP policy generation. In this context, reward shaping is the process by which expectations over joint actions are adjusted based on how coordinated the agent team is. An empirical evaluation of the benefits of this approach is presented in two domains. First, in the context of an idealised bench-mark problem, the multiagent Tiger problem, our method is shown to require significantly less communication (up to 30% fewer messages) and still achieves a 30% performance improvement over the current state of the art. Second, in the context of a larger-scale problem, RoboCupRescue, our method is shown to scale well, and operate without recourse to significant amounts of domain knowledge.

[1] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[2] Nicholas R. Jennings,et al. A principled information valuation for communications during multi-agent coordination , 2008 .

[3] Victor R. Lesser,et al. Multi-agent policies: from centralized ones to decentralized ones , 2002, AAMAS '02.

[4] Brahim Chaib-draa,et al. An online POMDP algorithm for complex multiagent environments , 2005, AAMAS '05.

[5] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[6] Makoto Yokoo,et al. Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[7] R. A. Leibler,et al. On Information and Sufficiency , 1951 .

[8] Claudia V. Goldman,et al. Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[9] Manuela M. Veloso,et al. Reasoning about joint beliefs for execution-time communication decisions , 2005, AAMAS '05.

[10] P. Cohen,et al. Rational Communication in Multi-Agent Environments , 2000 .

[11] Weixiong Zhang,et al. Towards Flexible Teamwork in Persistent Teams: Extended Report , 2000, Autonomous Agents and Multi-Agent Systems.

[12] Shlomo Zilberstein,et al. Value-based observation compression for DEC-POMDPs , 2008, AAMAS.

[13] Hiroaki Kitano,et al. RoboCup Rescue: a grand challenge for multi-agent systems , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[14] Weixiong Zhang,et al. Towards flexible teamwork in persistent teams , 1998, Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160).