Abstracting Influences for Efficient Multiagent Coordination Under Uncertainty

When planning optimal decisions for teams of agents acting in uncertain domains, conventional methods explicitly coordinate all joint policy decisions and, in doing so, are inherently susceptible to the curse of dimensionality, as state, action, and observation spaces grow exponentially with the number of agents. With the goal of extending the scalability of optimal team coordination, the research presented in this dissertation examines how agents can reduce the amount of information they need to coordinate. Intuitively, to the extent that agents are weakly coupled, they can avoid the complexity of coordinating all decisions; they need instead only coordinate abstractions of their policies that convey their essential influences on each other. In formalizing this intuition, I consider several complementary aspects of weakly-coupled problem structure, including agent scope size, corresponding to the number of an agent's peers whose decisions influence the agent's decisions, and degree of influence, corresponding to the proportion of unique influences that peers can feasibly exert. To exploit this structure, I introduce a (transition-dependent decentralized POMDP) model that efficiently decomposes into local decision models with shared state features. This context yields a novel characterization of influences as transition probabilities (compactly encoded using a dynamic Bayesian network). Not only is this influence representation provably sufficient for optimal coordination, but it also allows me to frame the subproblems of (1) proposing influences, (2) evaluating influences, and (3) computing optimal policies around influences as mixed-integer linear programs. The primary advantage of working in the influence space is that there are potentially significantly fewer feasible influences than there are policies. Blending prior work on decoupled joint policy search and constraint optimization, I develop influence-space search algorithms that, for problems with a low degree of influence, compute optimal solutions orders of magnitude faster than policy-space search. When agents' influences are constrained, influence-space search also outperforms other state-of-the-art optimal solution algorithms. Moreover, by exploiting both degree of influence and agent scope size, I demonstrate scalability, substantially beyond the reach of prior optimal methods, to teams of 50 weakly-coupled transition-dependent agents.

[1]  Edmund H. Durfee,et al.  Resource Allocation Among Agents with MDP-Induced Preferences , 2006, J. Artif. Intell. Res..

[2]  Milind Tambe,et al.  Planning with continuous resources for agent teams , 2009, AAMAS.

[3]  Kee-Eung Kim,et al.  Solving Stochastic Planning Problems with Large State and Action Spaces , 1998, AIPS.

[4]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[5]  Makoto Yokoo,et al.  Letting loose a SPIDER on a network of POMDPs: generating quality guaranteed policies , 2007, AAMAS '07.

[6]  Michael Pinedo,et al.  Scheduling: Theory, Algorithms, and Systems , 1994 .

[7]  Victor R. Lesser,et al.  Offline Planning for Communication by Exploiting Structured Interactions in Decentralized MDPs , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[8]  F. d'Epenoux,et al.  A Probabilistic Production and Inventory Problem , 1963 .

[9]  Marek Petrik,et al.  A Bilinear Programming Approach for Multiagent Planning , 2009, J. Artif. Intell. Res..

[10]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[11]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[12]  Victor R. Lesser,et al.  Agent interaction in distributed POMDPs and its implications on complexity , 2006, AAMAS '06.

[13]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[14]  Claudia V. Goldman,et al.  Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[15]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[16]  Milind Tambe,et al.  On opportunistic techniques for solving decentralized Markov decision processes with temporal constraints , 2007, AAMAS '07.

[17]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[18]  Shlomo Zilberstein,et al.  Formal models and algorithms for decentralized decision making under uncertainty , 2008, Autonomous Agents and Multi-Agent Systems.

[19]  C. Ronald Kube,et al.  Task Modelling in Collective Robotics , 1997, Auton. Robots.

[20]  Edmund H. Durfee,et al.  Abstract Reasoning for Planning and Coordination , 2002, SARA.

[21]  Martha E. Pollack,et al.  Efficient solution techniques for disjunctive temporal reasoning problems , 2003, Artif. Intell..

[22]  Katsuhiko Ogata,et al.  Modern control engineering (3rd ed.) , 1996 .

[23]  Shlomo Zilberstein,et al.  Policy Iteration for Decentralized Control of Markov Decision Processes , 2009, J. Artif. Intell. Res..

[24]  Nicholas R. Jennings,et al.  Controlling Cooperative Problem Solving in Industrial Multi-Agent Systems Using Joint Intentions , 1995, Artif. Intell..

[25]  Shlomo Zilberstein,et al.  Constraint-based dynamic programming for decentralized POMDPs with structured interactions , 2009, AAMAS.

[26]  Claudia V. Goldman,et al.  The complexity of multiagent systems: the price of silence , 2003, AAMAS '03.

[27]  Sridhar Mahadevan,et al.  Learning state-action basis functions for hierarchical MDPs , 2007, ICML '07.

[28]  S. Zilberstein,et al.  Complexity analysis and optimal algorithms for decentralized decision making , 2005 .

[29]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[30]  Victor R. Lesser,et al.  The Soft Real-Time Agent Control Architecture , 2005, Autonomous Agents and Multi-Agent Systems.

[31]  Edmund H. Durfee,et al.  Commitment-driven distributed joint policy search , 2007, AAMAS '07.

[32]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[33]  Rina Dechter,et al.  Bucket Elimination: A Unifying Framework for Reasoning , 1999, Artif. Intell..

[34]  Martin Allen,et al.  Agent interactions in decentralized environments , 2009 .

[35]  Frans A. Oliehoek,et al.  Value-Based Planning for Teams of Agents in Stochastic Partially Observable Environments , 2010 .

[36]  Francisco S. Melo Exploiting locality of interactions using a policy-gradient approach in multiagent learning , 2008, ECAI.

[37]  Craig Boutilier,et al.  Piecewise linear value function approximation for factored MDPs , 2002, AAAI/IAAI.

[38]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[39]  Makoto Yokoo,et al.  Adopt: asynchronous distributed constraint optimization with quality guarantees , 2005, Artif. Intell..

[40]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[41]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[42]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[43]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[44]  Edmund H. Durfee,et al.  Optimal Resource Allocation and Policy Formulation in Loosely-Coupled Markov Decision Processes , 2004, ICAPS.

[45]  Victor Lesser,et al.  Exploiting structure in decentralized markov decision processes , 2006 .

[46]  Victor R. Lesser,et al.  Decentralized Markov decision processes with event-driven interactions , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[47]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[48]  Victor R. Lesser,et al.  Generalizing the Partial Global Planning Algorithm , 1992, Int. J. Cooperative Inf. Syst..

[49]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[50]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[51]  Judy Goldsmith,et al.  Nonapproximability Results for Partially Observable Markov Decision Processes , 2011, Universität Trier, Mathematik/Informatik, Forschungsbericht.

[52]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[53]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[54]  Paolo Traverso,et al.  Service-Oriented Computing: State of the Art and Research Challenges , 2007, Computer.

[55]  Edmund H. Durfee,et al.  Partial global planning: a coordination framework for distributed hypothesis formation , 1991, IEEE Trans. Syst. Man Cybern..

[56]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[57]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[58]  Thomas Dean,et al.  Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[59]  Leslie Pack Kaelbling,et al.  Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[60]  Abdel-Illah Mouaddib,et al.  A polynomial algorithm for decentralized Markov decision processes with temporal constraints , 2005, AAMAS '05.

[61]  Ronen I. Brafman,et al.  From One to Many: Planning for Loosely Coupled Multi-Agent Systems , 2008, ICAPS.

[62]  James Atlas A distributed constraint optimization approach for coordination under uncertainty , 2009, AAMAS.

[63]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[64]  Jianhui Wu,et al.  Solving large TÆMS problems efficiently by selective exploration and decomposition , 2007, AAMAS '07.

[65]  Makoto Yokoo,et al.  Exploiting Locality of Interaction in Networked Distributed POMDPs , 2006, AAAI Spring Symposium: Distributed Plan and Schedule Management.

[66]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[67]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[68]  Edmund H. Durfee,et al.  Discovering and exploiting synergy between hierarchical planning agents , 2003, AAMAS '03.

[69]  Edmund H. Durfee,et al.  Graphical models in local, asymmetric multi-agent Markov decision processes , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[70]  Victor R. Lesser,et al.  Incorporating Uncertainty in Agent Commitments , 1999, ATAL.

[71]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[72]  Shimon Whiteson,et al.  Exploiting locality of interaction in factored Dec-POMDPs , 2008, AAMAS.

[73]  Nicholas R. Jennings,et al.  Reward shaping for valuing communications during multi-agent coordination , 2009, AAMAS.

[74]  Milind Tambe,et al.  Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping , 2009, ICAPS.

[75]  Christos H. Papadimitriou,et al.  Computational complexity , 1993 .

[76]  Shlomo Zilberstein,et al.  Planetary Rover Control as a Markov Decision Process , 2002 .

[77]  Victor R. Lesser,et al.  The Struggle for Reuse: Pros and Cons of Generalization in TÆMS and Its Impact on Technology Transition , 2003, IASSE.

[78]  Hector J. Levesque,et al.  Intention is Choice with Commitment , 1990, Artif. Intell..

[79]  Shlomo Zilberstein,et al.  Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs , 2007, UAI.

[80]  Ronen I. Brafman,et al.  Prioritized Goal Decomposition of Markov Decision Processes: Toward a Synthesis of Classical and Decision Theoretic Planning , 1997, IJCAI.

[81]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[82]  Claudia V. Goldman,et al.  Transition-independent decentralized markov decision processes , 2003, AAMAS '03.

[83]  Craig Boutilier,et al.  Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..

[84]  Prashant Doshi,et al.  Interactive POMDPs: properties and preliminary results , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[85]  Milind Tambe,et al.  Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..

[86]  Terry L. Zimmerman,et al.  Distributed management of flexible times schedules , 2007, AAMAS '07.

[87]  Candace L. Sidner,et al.  COLLAGEN: when agents collaborate with people , 1997, AGENTS '97.

[88]  Makoto Yokoo,et al.  Introducing Communication in Dis-POMDPs with Locality of Interaction , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[89]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[90]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[91]  Shobha Venkataraman,et al.  Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[92]  Edmund H. Durfee,et al.  Commitment-Based Service Coordination , 2008, SOCASE.

[93]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[94]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[95]  Rina Dechter,et al.  Constraint Processing , 1995, Lecture Notes in Computer Science.

[96]  Andrew G. Barto,et al.  A causal approach to hierarchical decomposition of factored MDPs , 2005, ICML.

[97]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[98]  Reid G. Smith,et al.  The Contract Net Protocol: High-Level Communication and Control in a Distributed Problem Solver , 1980, IEEE Transactions on Computers.

[99]  Geoffrey J. Gordon,et al.  Distributed Planning in Hierarchical Factored MDPs , 2002, UAI.

[100]  Makoto Yokoo,et al.  Not all agents are equal: scaling up distributed POMDPs for agent networks , 2008, AAMAS.

[101]  Edmund H. Durfee,et al.  Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors , 2005, IJCAI.

[102]  Shlomo Zilberstein,et al.  Optimizing Memory-Bounded Controllers for Decentralized POMDPs , 2007, UAI.

[103]  Matthijs T. J. Spaan,et al.  Multi-robot planning under uncertainty with communication: a case study , 2010 .

[104]  Sarit Kraus,et al.  Collaborative Plans for Complex Group Action , 1996, Artif. Intell..

[105]  Victor R. Lesser,et al.  Evolution of the GPGP/TÆMS Domain-Independent Coordination Framework , 2002, AAMAS '02.

[106]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[107]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[108]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[109]  Jianhui Wu,et al.  Coordinated Plan Management Using Multiagent MDPs , 2006, AAAI Spring Symposium: Distributed Plan and Schedule Management.

[110]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[111]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[112]  John S. Edwards,et al.  Linear Programming and Finite Markovian Control Problems , 1983 .

[113]  Edmund H. Durfee,et al.  Influence-Based Policy Abstraction for Weakly-Coupled Dec-POMDPs , 2010, ICAPS.

[114]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[115]  Victor R. Lesser,et al.  Planning for Weakly-Coupled Partially Observable Stochastic Games , 2005, IJCAI.

[116]  Boi Faltings,et al.  A Scalable Method for Multiagent Constraint Optimization , 2005, IJCAI.

[117]  Edmund H. Durfee,et al.  Resource-Driven Mission-Phasing Techniques for Constrained Agents in Stochastic Environments , 2010, J. Artif. Intell. Res..

[118]  Jianhui Wu,et al.  Mixed-integer linear programming for transition-independent decentralized MDPs , 2006, AAMAS '06.

[119]  Makoto Yokoo,et al.  The Distributed Constraint Satisfaction Problem: Formalization and Algorithms , 1998, IEEE Trans. Knowl. Data Eng..

[120]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[121]  Claudia V. Goldman,et al.  Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..

[122]  Satinder P. Singh,et al.  How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[123]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .