Dynamic Programming for Structured Continuous Markov Decision Problems

We describe an approach for exploiting structure in Markov Decision Processes with continuous state variables. At each step of the dynamic programming, the state space is dynamically partitioned into regions where the value function is the same throughout the region. We first describe the algorithm for piecewise constant representations. We then extend it to piecewise linear representations, using techniques from POMDPs to represent and reason about linear surfaces efficiently. We show that for complex, structured problems, our approach exploits the natural structure so that optimal solutions can be computed efficiently.

[1]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[2]  Manuela M. Veloso,et al.  Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[3]  Michael L. Littman,et al.  Exact Solutions to Time-Dependent MDPs , 2000, NIPS.

[4]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[5]  David E. Smith,et al.  Planning Under Continuous Time and Resource Uncertainty: A Challenge for AI , 2002, AIPS Workshop on Planning for Temporal Domains.

[6]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[7]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[8]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[9]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[10]  John Amanatides,et al.  Merging BSP trees yields polyhedral set operations , 1990, SIGGRAPH.

[11]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[12]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[13]  Zhengzhu Feng,et al.  Symbolic heuristic search for factored Markov decision processes , 2002, AAAI/IAAI.

[14]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[15]  Thomas G. Dietterich,et al.  Explanation-Based Learning and Reinforcement Learning: A Unified View , 1995, Machine-mediated learning.

[16]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.