Distributed Planning in Hierarchical Factored MDPs

We present a principled and efficient planning algorithm for collaborative multiagent dynamical systems. All computation, during both the planning and the execution phases, is distributed among the agents; each agent only needs to model and plan for a small part of the system. Each of these local subsystems is small, but once they are combined they can represent an exponentially larger problem. The subsystems are connected through a subsystem hierarchy. Coordination and communication between the agents is not imposed, but derived directly from the structure of this hierarchy. A globally consistent plan is achieved by a message passing algorithm, where messages correspond to natural local reward functions and are computed by local linear programs; another message passing algorithm allows us to execute the resulting policy. When two portions of the hierarchy share the same structure, our algorithm can reuse plans and messages to speed up computation.

[1]  A. S. Manne Linear Programming and Sequential Decisions , 1960 .

[2]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[3]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[4]  H. Kushner,et al.  Decomposition of systems governed by Markov chains , 1974 .

[5]  J. Birge Solution methods for stochastic dynamic linear programs , 1980 .

[6]  Nesa L'abbe Wu,et al.  Linear programming and extensions , 1981 .

[7]  Mark Stefik H. A. Simon, The Sciences of the Artificial , 1984, Artif. Intell..

[8]  P. Schweitzer,et al.  Generalized polynomial approximations in Markovian decision processes , 1985 .

[9]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[10]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[11]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12]  Thomas Dean,et al.  Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[13]  Avi Pfeffer,et al.  Object-Oriented Bayesian Networks , 1997, UAI.

[14]  Satinder P. Singh,et al.  How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[15]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[16]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[17]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[18]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[19]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[20]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[21]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[22]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[23]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[24]  Benjamin Van Roy,et al.  Approximate Dynamic Programming via Linear Programming , 2001, NIPS.