Distributed on-Line Multi-Agent Optimization under Uncertainty: Balancing Exploration and Exploitation

A significant body of work exists on effectively allowing multiple agents to coordinate to achieve a shared goal. In particular, a growing body of work in the Distributed Constraint Optimization (DCOP) framework enables such coordination with different amounts of teamwork. Such algorithms can implicitly or explicitly trade-off improved solution quality with increased communication and computation requirements. However, the DCOP framework is limited to planning problems; DCOP agents must have complete and accurate knowledge about the reward function at plan time.We extend the DCOP framework, defining the Distributed Coordination of Exploration and Exploitation (DCEE) problem class to address real-world problems, such as ad-hoc wireless network optimization, via multiple novel algorithms. DCEE algorithms differ from DCOP algorithms in that they (1) are limited to a finite number of actions in a single trial, (2) attempt to maximize the on-line, rather than final, reward, (3) are unable to exhaustively explore all possible actions, and (4) may have knowledge about the distribution of rewards in the environment, but not the rewards themselves. Thus, a DCEE problem is not a type of planning problem, as DCEE algorithms must carefully balance and coordinate multiple agents' exploration and exploitation.Two classes of algorithms are introduced: static estimation algorithms perform simple calculations that allow agents to either stay or explore, and balanced exploration algorithms use knowledge about the distribution of the rewards and the time remaining in an experiment to decide whether to stay, explore, or (in some algorithms) backtrack to a previous location. These two classes of DCEE algorithms are compared in simulation and on physical robots in a complex mobile ad-hoc wireless network setting. Contrary to our expectations, we found that increasing teamwork in DCEE algorithms may lower team performance. In contrast, agents running DCOP algorithms improve their reward as teamwork increases. We term this previously unknown phenomenon the team uncertainty penalty, analyze it in both simulation and on robots, and present techniques to ameliorate the penalty.

[1]  V. R. Lesser,et al.  Asynchronous Partial Overlay: A New Algorithm for Solving Distributed Constraint Satisfaction Problems , 2011, J. Artif. Intell. Res..

[2]  Peter Stone,et al.  Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[3]  Stephen Fitzpatrick,et al.  Distributed Coordination through Anarchic Optimization , 2003 .

[4]  Meritxell Vinyals,et al.  Divide and Coordinate: solving DCOPs by agreement , 2010, AAMAS 2010.

[5]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[6]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[7]  Milind Tambe,et al.  Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..

[8]  Stefano Rizzi,et al.  Unsupervised Multi-Agent Exploration of Structured Environments , 1995, ICMAS.

[9]  Ann Nowé,et al.  Coordinated exploration in multi-agent reinforcement learning: an application to load-balancing , 2005, AAMAS '05.

[10]  Victor R. Lesser,et al.  Solving distributed constraint optimization problems using cooperative mediation , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[11]  Jeff G. Schneider,et al.  Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[12]  Abbas Jamalipour,et al.  Wireless communications , 2005, GLOBECOM '05. IEEE Global Telecommunications Conference, 2005..

[13]  Edmund H. Durfee,et al.  Blissful Ignorance: Knowing Just Enough to Coordinate Well , 1995, ICMAS.

[14]  Nikolaus Correll,et al.  Ad-hoc wireless network coverage with networked robots that cannot localize , 2009, 2009 IEEE International Conference on Robotics and Automation.

[15]  Makoto Yokoo,et al.  When should there be a "Me" in "Team"?: distributed multi-agent optimization under uncertainty , 2010, AAMAS.

[16]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[17]  Yang Xu,et al.  An integrated token-based algorithm for scalable coordination , 2005, AAMAS '05.

[18]  Nicholas R. Jennings,et al.  Decentralised Coordination of Mobile Sensors Using the Max-Sum Algorithm , 2009, IJCAI.

[19]  Radhika Nagpal,et al.  Robust and Self-Repairing Formation Control for Swarms of Mobile Agents , 2005, AAAI.

[20]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[21]  Amnon Meisels,et al.  Distributed constraint satisfaction with partially known constraints , 2009, Constraints.

[22]  Milind Tambe,et al.  Argumentation as distributed constraint satisfaction: applications and results , 2001, AGENTS '01.

[23]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[24]  Weixiong Zhang,et al.  An analysis and application of distributed constraint satisfaction and optimization algorithms in sensor networks , 2003, AAMAS '03.

[25]  Connections between cooperative control and potential games illustrated on the consensus problem , 2007, 2007 European Control Conference (ECC).

[26]  Milind Tambe,et al.  Distributed Sensor Networks: A Multiagent Perspective , 2003 .

[27]  Bhaskar Krishnamachari,et al.  Communication and Computation in Distributed CSP Algorithms , 2002, CP.

[28]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[29]  Nicholas R. Jennings,et al.  Decentralised coordination of low-power embedded devices using the max-sum algorithm , 2008, AAMAS.

[30]  Milind Tambe,et al.  Asynchronous algorithms for approximate distributed constraint optimization with quality bounds , 2010, AAMAS.

[31]  Milind Tambe,et al.  Conflicts in teamwork: hybrids to the rescue , 2005, AAMAS '05.

[32]  Milind Tambe,et al.  A prototype infrastructure for distributed robot-agent-person teams , 2003, AAMAS '03.

[33]  Milind Tambe,et al.  Solving Multiagent Networks using Distributed Constraint Optimization , 2008, AI Mag..

[34]  Norman M. Sadeh,et al.  Examining DCSP coordination tradeoffs , 2006, AAMAS '06.

[35]  Boi Faltings,et al.  Open constraint programming , 2005, Artif. Intell..

[36]  Milind Tambe,et al.  Distributed Algorithms for DCOP: A Graphical-Game-Based Approach , 2004, PDCS.

[37]  S. Kozono Received signal-level characteristics in a wide-band mobile radio channel , 1994 .

[38]  Roie Zivan Anytime Local Search for Distributed Constraint Optimization , 2008, AAAI.

[39]  Hector J. Levesque,et al.  On Acting Together , 1990, AAAI.

[40]  Makoto Yokoo,et al.  DCOPs meet the realworld: exploring unknown reward matrices with applications to mobile sensor networks , 2009, IJCAI 2009.

[41]  Sarit Kraus,et al.  Multi-robot perimeter patrol in adversarial settings , 2008, 2008 IEEE International Conference on Robotics and Automation.

[42]  Makoto Yokoo,et al.  Adopt: asynchronous distributed constraint optimization with quality guarantees , 2005, Artif. Intell..

[43]  Robert N. Lass,et al.  Dynamic Distributed Constraint Reasoning , 2008, AAAI.

[44]  Victor R. Lesser,et al.  Using organization knowledge to improve routing performance in wireless multi-agent networks , 2008, AAMAS.

[45]  Milind Tambe,et al.  Distributed Sensor Networks , 2003, Multiagent Systems, Artificial Societies, and Simulated Organizations.

[46]  P. Freeman The Secretary Problem and its Extensions: A Review , 1983 .

[47]  Makoto Yokoo,et al.  Algorithms for Distributed Constraint Satisfaction: A Review , 2000, Autonomous Agents and Multi-Agent Systems.

[48]  Edmund H. Durfee,et al.  A distributed framework for solving the Multiagent Plan Coordination Problem , 2005, AAMAS '05.

[49]  C. Sidner,et al.  Plans for Discourse , 1988 .

[50]  Michael D. Smith,et al.  SSDPOP: improving the privacy of DCOP with secret sharing , 2007, AAMAS '07.

[51]  Boi Faltings,et al.  A Scalable Method for Multiagent Constraint Optimization , 2005, IJCAI.

[52]  Bohdana Ratitch,et al.  Multi-agent patrolling with reinforcement learning , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[53]  Milind Tambe,et al.  Quality Guarantees on k-Optimal Solutions for Distributed Constraint Optimization Problems , 2007, IJCAI.

[54]  Makoto Yokoo,et al.  The Distributed Constraint Satisfaction Problem: Formalization and Algorithms , 1998, IEEE Trans. Knowl. Data Eng..