Coordinating multiagent teams in uncertain domains using distributed pomdps

Distributed Partially Observable Markov Decision Problems (POMDPs) have emerged as a popular decision-theoretic approach for planning for multiagent teams, where it is imperative for the agents to be able to reason about the rewards (and costs) for their actions in the presence of uncertainty. However, finding the optimal distributed POMDP policy is computationally intractable (NEXP-Complete). This dissertation presents two independent approaches which deal with this issue of intractability in distributed POMDPs. The primary focus is on the first approach, which represents a principled way to combine the two dominant paradigms for building multiagent team plans, namely the “belief-desire-intention” (BDI) approach and distributed POMDPs. In this hybrid BDI-POMDP approach, BDI team plans are exploited to improve distributed POMDP tractability and distributed POMDP-based analysis improves BDI team plan performance. Concretely, we focus on role allocation, a fundamental problem in BDI teams—which agents to allocate to the different roles in the team. The hybrid BDI-POMDP approach provides three key contributions. First, unlike prior work in multiagent role allocation, we describe a role allocation technique that takes into account future uncertainties in the domain. The second contribution is a novel decomposition technique, which exploits the structure in the BDI team plans to significantly prune the search space of combinatorially many role allocations. Our third key contribution is a significantly faster policy evaluation algorithm suited for our BDI-POMDP hybrid approach. Finally, we also present experimental results from two domains: mission rehearsal simulation and RoboCupRescue disaster rescue simulation. In the RoboCupRescue domain, we show that the role allocation technique presented in this dissertation is capable of performing at human expert levels by comparing with the allocations chosen by humans in the actual RoboCupRescue simulation environment. The second approach for dealing with the intractability of distributed POMDPs is based on finding locally optimal joint policies using Nash equilibrium as a solution concept. Through the introduction of communication, we not only show improved coordination but also develop a novel compact policy representation that results in savings of both space and time which are verified empirically.

[1]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[2]  Makoto Yokoo,et al.  Communications for improving policy computation in distributed POMDPs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[3]  Geoffrey J. Gordon,et al.  Distributed Planning in Hierarchical Factored MDPs , 2002, UAI.

[4]  John Yen,et al.  CAST: Collaborative Agents for Simulating Teamwork , 2001, IJCAI.

[5]  David Furcy,et al.  Lifelong Planning A , 2004, Artif. Intell..

[6]  Hector J. Levesque,et al.  On Acting Together , 1990, AAAI.

[7]  Yves Demazeau,et al.  Vowels co-ordination model , 2002, AAMAS '02.

[8]  François Charpillet,et al.  A heuristic approach for solving decentralized-POMDP: assessment on the pursuit problem , 2002, SAC '02.

[9]  Nicholas R. Jennings,et al.  Controlling Cooperative Problem Solving in Industrial Multi-Agent Systems Using Joint Intentions , 1995, Artif. Intell..

[10]  Craig Boutilier,et al.  Decision-Theoretic, High-Level Agent Programming in the Situation Calculus , 2000, AAAI/IAAI.

[11]  Victor R. Lesser,et al.  Solving distributed constraint optimization problems using cooperative mediation , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[12]  Jeff G. Schneider,et al.  Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[13]  G. Tidhar,et al.  Guided Team Selection * , 1996 .

[14]  Candace L. Sidner,et al.  COLLAGEN: when agents collaborate with people , 1997, AGENTS '97.

[15]  Michael Wooldridge,et al.  Adaptive Task and Resource Allocation in Multi-Agent Systems , 2001 .

[16]  Milind Tambe,et al.  Building Dynamic Agent Organizations in Cyberspace , 2000, IEEE Internet Comput..

[17]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[18]  James A. Hendler,et al.  HTN Planning: Complexity and Expressivity , 1994, AAAI.

[19]  Milind Tambe,et al.  Towards Adjustable Autonomy for the Real World , 2002, J. Artif. Intell. Res..

[20]  Milind Tambe,et al.  Role allocation and reallocation in multiagent teams: towards a practical analysis , 2003, AAMAS '03.

[21]  Victor R. Lesser,et al.  Quantitative Modeling of Complex Computational Task Environments , 1993, AAAI.

[22]  Hector Muñoz-Avila,et al.  IMPACTing SHOP: Putting an AI Planner Into a Multi-Agent Environment , 2003, Annals of Mathematics and Artificial Intelligence.

[23]  Claudia V. Goldman,et al.  Transition-independent decentralized markov decision processes , 2003, AAMAS '03.

[24]  Sarit Kraus,et al.  Planning and Acting Together , 1999, AI Mag..

[25]  Gaurav S. Sukhatme,et al.  Multi-Robot Task Allocation in Uncertain Environments , 2003, Auton. Robots.

[26]  Milind Tambe,et al.  Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..

[27]  M.P. Georgeff,et al.  Procedural knowledge , 1986, Proceedings of the IEEE.

[28]  Milind Tambe,et al.  A prototype infrastructure for distributed robot-agent-person teams , 2003, AAMAS '03.

[29]  Makoto Yokoo,et al.  An asynchronous complete method for distributed constraint optimization , 2003, AAMAS '03.

[30]  Victor R. Lesser,et al.  Using self-diagnosis to adapt organizational structures , 2001, AGENTS '01.

[31]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[32]  Thomas Dean,et al.  Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[33]  Michael Wooldridge,et al.  Reasoning about Intentions in Uncertain Domains , 2001, ECSQARU.

[34]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[35]  Sarit Kraus,et al.  Collaborative Plans for Complex Group Action , 1996, Artif. Intell..

[36]  Barbara Dunin-Keplicz,et al.  A reconfiguration algorithm for distributed problem solving , 2001 .

[37]  Victor R. Lesser,et al.  Multi-agent policies: from centralized ones to decentralized ones , 2002, AAMAS '02.

[38]  Tsuneo Yoshikawa,et al.  Decomposition of Dynamic Team Decision Problems , 1977 .

[39]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[40]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[41]  Claudia V. Goldman,et al.  Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[42]  Manuela M. Veloso,et al.  Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork , 1999, Artif. Intell..

[43]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[44]  Barbara Messing,et al.  An Introduction to MultiAgent Systems , 2002, Künstliche Intell..

[45]  Shobha Venkataraman,et al.  Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[46]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[47]  K. Suzanne Barber,et al.  Dynamic reorganization of decision-making groups , 2001, AGENTS '01.

[48]  Luke Hunsberger,et al.  A combinatorial auction for collaborative planning , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[49]  R. Radner,et al.  Economic theory of teams , 1972 .

[50]  Sarit Kraus,et al.  Methods for Task Allocation via Agent Coalition Formation , 1998, Artif. Intell..

[51]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[52]  Gil A. Tidhar Team-Oriented Programming: Preliminary Report , 1993 .

[53]  Yu-Chi Ho Team decision theory and information structures , 1980, Proceedings of the IEEE.

[54]  Milind Tambe,et al.  Team Formation for Reformation in Multiagent Domains Like RoboCupRescue , 2002, RoboCup.