Structured Solution Methods for Non-Markovian Decision Processes

Markov Decision Processes (MDPs), currently a popular method for modeling and solving decision theoretic planning problems, are limited by the Markovian assumption: rewards and dynamics depend on the current state only, and not on previous history. Non-Markovian decision processes (NMDPs) can also be defined, but then the more tractable solution techniques developed for MDP's cannot be directly applied. In this paper, we show how an NMDP, in which temporal logic is used to specify history dependence, can be automatically converted into an equivalent MDP by adding appropriate temporal variables. The resulting MDP can be represented in a structured fashion and solved using structured policy construction methods. In many cases, this offers significant computational advantages over previous proposals for solving NMDPs.

[1]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[3]  Mark Drummond,et al.  Situated Control Rules , 1989, KR.

[4]  E. Allen Emerson,et al.  Temporal and Modal Logic , 1991, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[5]  Froduald Kabanza,et al.  Synthesis of Reactive Plans for Multi-Path Environments , 1990, AAAI.

[6]  Patrice Godefroid,et al.  An Efficient Reactive Planner for Synthesizing Reactive Plans , 1991, AAAI.

[7]  Michael P. Wellman,et al.  Planning and Control , 1991 .

[8]  Peter Haddawy,et al.  Representations for Decision-Theoretic Planning: Utility Functions for Deadline Goals , 1992, KR.

[9]  Craig Boutilier,et al.  Using Abstractions for Decision-Theoretic Planning with Time Constraints , 1994, AAAI.

[10]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11]  Nicholas Kushmerick,et al.  An Algorithm for Probabilistic Least-Commitment Planning , 1994, AAAI.

[12]  Thomas G. Dietterich,et al.  Explanation-Based Learning and Reinforcement Learning: A Unified View , 1995, Machine-mediated learning.

[13]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[14]  Craig Boutilier,et al.  Approximate Value Trees in Structured Dynamic Programming , 1996, ICML.

[15]  Craig Boutilier,et al.  Rewarding Behaviors , 1996, AAAI/IAAI, Vol. 2.

[16]  Craig Boutilier,et al.  Correlated Action Effects in Decision Theoretic Regression , 1997, UAI.