Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains

Substantial work has investigated balancing exploration and exploita- tion, but relatively little has addressed this tradeoff in the context of coordinated multi-agent interactions. This paper introduces a class of problems in which agents must maximize their on-line reward, a decomposable function dependent on pairs of agent's decisions. Unlike previous work, agents must both learn the reward function and exploit it on-line, critical properties for a class of physically- motivated systems, such as mobile wireless networks. This paper introduces algo- rithms motivated by the Distributed Constraint Optimization Problem framework and demonstrates when, and at what cost, increasing agents' coordination can im- prove the global reward on such problems.

[1]  Milind Tambe,et al.  Distributed Sensor Networks: A Multiagent Perspective , 2003 .

[2]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[3]  Boi Faltings,et al.  A Scalable Method for Multiagent Constraint Optimization , 2005, IJCAI.

[4]  Makoto Yokoo,et al.  Adopt: asynchronous distributed constraint optimization with quality guarantees , 2005, Artif. Intell..

[5]  Milind Tambe,et al.  Distributed Algorithms for DCOP: A Graphical-Game-Based Approach , 2004, PDCS.

[6]  Nikolaus Correll,et al.  Ad-hoc wireless network coverage with networked robots that cannot localize , 2009, 2009 IEEE International Conference on Robotics and Automation.

[7]  Victor R. Lesser,et al.  Solving distributed constraint optimization problems using cooperative mediation , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[8]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[9]  Radhika Nagpal,et al.  Robust and Self-Repairing Formation Control for Swarms of Mobile Agents , 2005, AAAI.

[10]  Milind Tambe,et al.  Distributed Sensor Networks: Introduction to a Multiagent Perspective , 2003 .

[11]  Weixiong Zhang,et al.  An analysis and application of distributed constraint satisfaction and optimization algorithms in sensor networks , 2003, AAMAS '03.

[12]  Connections between cooperative control and potential games illustrated on the consensus problem , 2007, 2007 European Control Conference (ECC).

[13]  Milind Tambe,et al.  Quality Guarantees on k-Optimal Solutions for Distributed Constraint Optimization Problems , 2007, IJCAI.

[14]  Andreas F. Molisch,et al.  Wireless Communications , 2005 .

[15]  Roger Mailler,et al.  Commbots: Distributed control of mobile communication relays , 2006 .

[16]  P. Freeman The Secretary Problem and its Extensions: A Review , 1983 .

[17]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[18]  Makoto Yokoo,et al.  DCOPs meet the realworld: exploring unknown reward matrices with applications to mobile sensor networks , 2009, IJCAI 2009.