Execution-time communication decisions for coordination of multi-agent teams

Although multi-agent teams provide additional functionality and robustness over single-agent systems, they also present additional challenges, mainly due to the difficulty of coordinating multiple agents in the presence of uncertainty and partial observability. Agents must reason about the collective state and behaviors of the team as well as uncertainty in their own environment. In this thesis, we employ Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs), an extension of single-agent POMDPs that can be used to model and coordinate teams of agents. Although the problem of finding optimal policies for Dec-POMDPs is highly intractable, it is known that the presence of free communication transforms a multi-agent Dec-POMDP into a more tractable single-agent POMDP. We use this transformation to generate "centralized" policies for multi-agent teams modeled by Dec-POMDPs. We facilitate the decentralize execution of these centralized policies by providing algorithms that allow agents to reason about communication at execution-time. Our approach trades off the need to do some computation at execution-time for the ability to generate policies more tractably at plan-lime. This thesis explores the question of how communication can be used effectively to enable the coordination of cooperative multi-agent teams making sequential decisions under uncertainty and partial observability. We identify two fundamental questions that must be answered when reasoning about communication: "When should agents communicate," and "What should agents communicate?" We present two basic approaches to enabling a team of distributed agents to avoid coordination errors, The first is an algorithm that reasons over the possible joint beliefs the team. We provide algorithms that address the questions of when and what agents should communicate. The second approach presented in this thesis avoids coordination errors by creating individual factored policy for each agent. Factored policies provide a means for determining which state features agents should communicate, answering the questions of when and what agents should communicate. We use factored policies to identify instances of context-specific independence, in which agents can act without needing to consider the actions or observations of their teammates.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Yifeng Zeng,et al.  Graphical models for online solutions to interactive POMDPs , 2007, AAMAS '07.

[3]  Martin Nilsson,et al.  Cooperative multi-robot box-pushing , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[4]  Manuela M. Veloso,et al.  A real-time world model for multi-robot teams with high-latency communication , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[5]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[6]  G. W. Wornell,et al.  Decentralized control of a multiple access broadcast channel: performance bounds , 1996, Proceedings of 35th IEEE Conference on Decision and Control.

[7]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[8]  Victor R. Lesser,et al.  Analyzing myopic approaches for multi-agent communication , 2005, IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[9]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[10]  Nidhi Kalra,et al.  Market-Based Multirobot Coordination: A Survey and Analysis , 2006, Proceedings of the IEEE.

[11]  Manuela M. Veloso,et al.  Exploiting factored representations for decentralized execution in multiagent teams , 2007, AAMAS '07.

[12]  Lovekesh Vig,et al.  Multi-robot coalition formation , 2006, IEEE Transactions on Robotics.

[13]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[14]  Nidhi Kalra,et al.  Hoplites: A Market-Based Framework for Planned Tight Coordination in Multirobot Teams , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[15]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[16]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[17]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[18]  Claudia V. Goldman,et al.  Transition-independent decentralized markov decision processes , 2003, AAMAS '03.

[19]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[20]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[21]  Claudia V. Goldman,et al.  Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..

[22]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[23]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[24]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[25]  Anthony Stentz,et al.  Market-Based Multi-Robot Planning in a Distributed Layered Architecture , 2003 .

[26]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[27]  Geoffrey J. Gordon,et al.  Distributed Planning in Hierarchical Factored MDPs , 2002, UAI.

[28]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[29]  Zhengzhu Feng,et al.  Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.

[30]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[31]  Claudia V. Goldman,et al.  Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[32]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[33]  Victor R. Lesser,et al.  Multi-agent policies: from centralized ones to decentralized ones , 2002, AAMAS '02.

[34]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[35]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[36]  Reid G. Simmons,et al.  Coordinated Multiagent Teams and Sliding Autonomy for Large-Scale Assembly , 2006, Proceedings of the IEEE.

[37]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[38]  Victor R. Lesser,et al.  Analyzing Myopic Approaches for Multi-Agent Communication , 2005, IAT.

[39]  Craig Boutilier,et al.  Value-directed sampling methods for monitoring POMDPs , 2001, UAI 2001.

[40]  Craig Boutilier,et al.  Approximate Value Trees in Structured Dynamic Programming , 1996, ICML.

[41]  Sridhar Mahadevan,et al.  Learning to communicate and act using hierarchical reinforcement learning , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[42]  Manuela M. Veloso,et al.  Reasoning about joint beliefs for execution-time communication decisions , 2005, AAMAS '05.

[43]  Manuela M. Veloso,et al.  CM-Pack'01: Fast Legged Robot Walking, Robust Localization, and Team Behaviors , 2001, RoboCup.

[44]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[45]  Milind Tambe,et al.  Robocup Rescue: A Proposal and Preliminary Experiences , 2000 .

[46]  Eric T. Matson,et al.  Taxonomy of cooperative robotic systems , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[47]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[48]  Victor R. Lesser,et al.  Communication decisions in multi-agent cooperation: model and experiments , 2001, AGENTS '01.

[49]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[50]  William T. B. Uther,et al.  Vision , Strategy , and Localization Using the Sony Legged Robots at RoboCup-98 , 2000 .

[51]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[52]  Sebastian Thrun,et al.  Decentralized Sensor Fusion with Distributed Particle Filters , 2002, UAI.

[53]  Victor R. Lesser,et al.  Minimizing communication cost in a distributed Bayesian network using a decentralized MDP , 2003, AAMAS '03.

[54]  Manuela Veloso,et al.  DYNAMIC MULTI-ROBOT COORDINATION , 2003 .

[55]  Jeff G. Schneider,et al.  Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[56]  Martin Allen,et al.  Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving , 2007, AAAI.

[57]  Kul B. Bhasin,et al.  Advanced Communication and Networking Technologies for Mars Exploration , 2001 .

[58]  Anthony Stentz,et al.  A Free Market Architecture for Distributed Control of a Multirobot System , 2000 .

[59]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[60]  Michael Wooldridge,et al.  The Belief-Desire-Intention Model of Agency , 1998, ATAL.

[61]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[62]  Jeff G. Schneider,et al.  Game Theoretic Control for Robot Teams , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[63]  Makoto Yokoo,et al.  Communications for improving policy computation in distributed POMDPs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[64]  Hiroaki Kitano,et al.  Playing soccer with legged robots , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[65]  Nikos A. Vlassis,et al.  Q-value functions for decentralized POMDPs , 2007, AAMAS '07.

[66]  Manuela M. Veloso,et al.  Visual sonar: fast obstacle avoidance using monocular vision , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[67]  Hiroaki Kitano,et al.  Vision, Strategy, and Localization Using the Sony Robots at RoboCup-98 , 2000, AI Mag..

[68]  François Charpillet,et al.  Point-based Dynamic Programming for DEC-POMDPs , 2006, AAAI.

[69]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[70]  Makoto Yokoo,et al.  Winning back the CUP for distributed POMDPs: planning over continuous belief spaces , 2006, AAMAS '06.

[71]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[72]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[73]  Manuela Veloso,et al.  What to Communicate? Execution-Time Decision in Multi-agent POMDPs , 2006, DARS.

[74]  Shobha Venkataraman,et al.  Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[75]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[76]  P. J. Gmytrasiewicz,et al.  A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[77]  Manuela M. Veloso,et al.  Sensor resetting localization for poorly modelled mobile robots , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[78]  Lynne E. Parker,et al.  ALLIANCE: an architecture for fault tolerant multirobot cooperation , 1998, IEEE Trans. Robotics Autom..

[79]  Michael E. Bratman,et al.  Intention, Plans, and Practical Reason , 1991 .