论文信息 - Policy Iteration for Relational MDPs

Policy Iteration for Relational MDPs

Relational Markov Decision Processes are a useful abstraction for complex reinforcement learning problems and stochastic planning problems. Recent work developed representation schemes and algorithms for planning in such problems using the value iteration algorithm. However, exact versions of more complex algorithms, including policy iteration, have not been developed or analyzed. The paper investigates this potential and makes several contributions. First we observe two anomalies for relational representations showing that the value of some policies is not well defined or cannot be calculated for restricted representation schemes used in the literature. On the other hand, we develop a variant of policy iteration that can get around these anomalies. The algorithm includes an aspect of policy improvement in the process of policy evaluation and thus differs from the original algorithm. We show that despite this difference the algorithm converges to the optimal policy.

Roni Khardon | Chenggang Wang

[1] Scott Sanner,et al. Approximate Linear Programming for First-order MDPs , 2005, UAI.

[2] Sylvie Thiébaux,et al. Exploiting First-Order Regression in Inductive Policy Selection , 2004, UAI.

[3] Luc De Raedt,et al. Bellman goes relational , 2004, ICML.

[4] Roni Khardon,et al. First Order Decision Diagrams for Relational MDPs , 2007, IJCAI.

[5] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[6] Scott Sanner,et al. Practical Linear Value-approximation Techniques for First-order MDPs , 2006, UAI.

[7] Carlos Guestrin,et al. Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[8] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10] Craig Boutilier,et al. Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[11] Enrico Macii,et al. Algebraic decision diagrams and their applications , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[12] Steffen Hölldobler,et al. A Logic-based Approach to Dynamic Programming , 2004 .