Decomposition Techniques for Planning in Stochastic Domains

This paper is concerned with modeling planning problems involving uncertainty as discrete-time, finite-state stochastic automata Solving planning problems is reduced to computing policies for Markov decision processes. Classical methods for solving Markov decision processes cannot cope with the size of the state spaces for typical problems encountered in practice. As an alternative, we investigate methods that decompose global planning problems into a number of local problems solve the local problems separately and then combine the local solutions to generate a global solution. We present algorithms that decompose planning problems into smaller problems given an arbitrary partition of the state space. The local problems are interpreted as Markov decision processes and solutions to the local problems are interpreted as policies restricted to the subsets of the state space defined by the partition. One algorithm relies on constructing and solving an abstract version of the original decision problem. A second algorithm iteratively approximates parameters of the local problems to converge to an optimal solution. We show how properties of a specified partition affect the time and storage required for these algorithms.

[1]  George B. Dantzig,et al.  Decomposition Principle for Linear Programs , 1960 .

[2]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[3]  F. d'Epenoux,et al.  A Probabilistic Production and Inventory Problem , 1963 .

[4]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[5]  Leon S. Lasdon,et al.  Optimization Theory of Large Systems , 1970 .

[6]  H. Kushner,et al.  Mathematical programming and the control of Markov chains , 1971 .

[7]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[8]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[9]  Earl D. Sacerdott Planning in a hierarchy of abstraction spaces , 1973, IJCAI 1973.

[10]  H. Kushner,et al.  Decomposition of systems governed by Markov chains , 1974 .

[11]  Hanif D. Sherali,et al.  Linear Programming and Network Flows , 1977 .

[12]  Richard E. Korf,et al.  Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[13]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[14]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[15]  Hanif D. Sherali,et al.  Linear programming and network flows (2nd ed.) , 1990 .

[16]  W. M. Wonham,et al.  On the consistency of hierarchical supervision in discrete-event systems , 1990 .

[17]  P. Caines,et al.  COCOLOG: a conditional observer and controller logic for finite machines , 1990, 29th IEEE Conference on Decision and Control.

[18]  Craig A. Knoblock Search Reduction in Hierarchical Problem Solving , 1991, AAAI.

[19]  Leslie Pack Kaelbling,et al.  Planning With Deadlines in Stochastic Domains , 1993, AAAI.

[20]  Thomas Dean,et al.  Exploiting Locality in Temporal Reasoning , 1993 .

[21]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[22]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[23]  Leslie Pack Kaelbling,et al.  Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..

[24]  P. Caines,et al.  COCOLOG: A Conditional Observer and Controller Logic for Finite Machines , 1995 .

[25]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.