论文信息 - Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains

Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains

Substantial work has investigated balancing exploration and exploita- tion, but relatively little has addressed this tradeoff in the context of coordinated multi-agent interactions. This paper introduces a class of problems in which agents must maximize their on-line reward, a decomposable function dependent on pairs of agent's decisions. Unlike previous work, agents must both learn the reward function and exploit it on-line, critical properties for a class of physically- motivated systems, such as mobile wireless networks. This paper introduces algo- rithms motivated by the Distributed Constraint Optimization Problem framework and demonstrates when, and at what cost, increasing agents' coordination can im- prove the global reward on such problems.

Milind Tambe | Matthew E. Taylor | Prateek Tandon | Manish Jain

[1] Milind Tambe,et al. Distributed Sensor Networks: A Multiagent Perspective , 2003 .

[2] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[3] Boi Faltings,et al. A Scalable Method for Multiagent Constraint Optimization , 2005, IJCAI.

[4] Makoto Yokoo,et al. Adopt: asynchronous distributed constraint optimization with quality guarantees , 2005, Artif. Intell..

[5] Milind Tambe,et al. Distributed Algorithms for DCOP: A Graphical-Game-Based Approach , 2004, PDCS.

[6] Nikolaus Correll,et al. Ad-hoc wireless network coverage with networked robots that cannot localize , 2009, 2009 IEEE International Conference on Robotics and Automation.

[7] Victor R. Lesser,et al. Solving distributed constraint optimization problems using cooperative mediation , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[8] Nikos A. Vlassis,et al. Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[9] Radhika Nagpal,et al. Robust and Self-Repairing Formation Control for Swarms of Mobile Agents , 2005, AAAI.

[10] Milind Tambe,et al. Distributed Sensor Networks: Introduction to a Multiagent Perspective , 2003 .

[11] Weixiong Zhang,et al. An analysis and application of distributed constraint satisfaction and optimization algorithms in sensor networks , 2003, AAMAS '03.

[12] Connections between cooperative control and potential games illustrated on the consensus problem , 2007, 2007 European Control Conference (ECC).

[13] Milind Tambe,et al. Quality Guarantees on k-Optimal Solutions for Distributed Constraint Optimization Problems , 2007, IJCAI.

[14] Andreas F. Molisch,et al. Wireless Communications , 2005 .

[15] Roger Mailler,et al. Commbots: Distributed control of mobile communication relays , 2006 .

[16] P. Freeman. The Secretary Problem and its Extensions: A Review , 1983 .

[17] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.

[18] Makoto Yokoo,et al. DCOPs meet the realworld: exploring unknown reward matrices with applications to mobile sensor networks , 2009, IJCAI 2009.