Model Minimization in Markov Decision Processes

We use the notion of stochastic bisimulation homogeneity to analyze planning problems represented as Markov decision processes (MDPs). Informally, a partition of the state space for an MDP is said to be homogeneous if for each action, states in the same block have the same probability of being carried to each other block. We provide an algorithm for finding the coarsest homogeneous refinement of any partition of the state space of an MDP. The resulting partition can be used to construct a reduced MDP which is minimal in a well defined sense and can be used to solve the original MDP. Our algorithm is an adaptation of known automata minimization algorithms, and is designed to operate naturally on factored or implicit representations in which the full state space is never explicitly enumerated. We show that simple variations on this algorithm are equivalent or closely similar to several different recently published algorithms for finding optimal solutions to (partially or fully observable) factored Markov decision processes, thereby providing alternative descriptions of the methods and results regarding those algorithms.

[1]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[2]  J. Hartmanis Algebraic structure theory of sequential machines (Prentice-Hall international series in applied mathematics) , 1966 .

[3]  J. Hartmanis,et al.  Algebraic Structure Theory Of Sequential Machines , 1966 .

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[5]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[6]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[7]  David Lee,et al.  Online minimization of transition systems (extended abstract) , 1992, STOC '92.

[8]  Nicolas Halbwachs,et al.  Minimal State Graph Generation , 1992, Science of Computer Programming.

[9]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[10]  Craig Boutilier,et al.  Using Abstractions for Decision-Theoretic Planning with Time Constraints , 1994, AAAI.

[11]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12]  Leslie Pack Kaelbling,et al.  Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..

[13]  Nicholas Kushmerick,et al.  An Algorithm for Probabilistic Planning , 1995, Artif. Intell..

[14]  Thomas G. Dietterich,et al.  Explanation-Based Learning and Reinforcement Learning: A Unified View , 1995, Machine-mediated learning.

[15]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[16]  Craig Boutilier,et al.  Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[17]  T. Dean,et al.  Planning under uncertainty: structural assumptions and computational leverage , 1996 .