论文信息 - Knowledge Transfer in Markov Decision Processes

Knowledge Transfer in Markov Decision Processes

Markov Decision Processes (MDPs) are an effective way to for mulate many problems in Machine Learning. However, learning the optimal policy for an MDP ca n be a time-consuming process, especially when nothing is known about the policy to begin with. An alter native approach is to find a similar MDP, for which an optimal policy is known, and modify this policy a s needed. We present a framework for measuring the quality of knowledge transfer when transferr ing policies from one MDP to another. Our formulation is based upon the use of MDP bisimulation metric s, which provide a stable quantitative notion of state similarity for MDPs. Given two MDPs and a stat e mapping from the first to the second, a policy defined on the latter naturally induces a policy on th e former. We provide a bound on the value function of the induced policy, showing that if the two MDPs are behaviorally close in terms of bisimulation distance and the original policy is close to op timal then the induced policy is guaranteed to be close to optimal as well. We also present some experiments in which simple MDPs are used to test the tightness of the bound provided by the bisimulation distanc e. In light of the results of these experiments, we suggest a new similarity measure.

Doina Precup | Joelle Pineau | P. Panangaden

[1] Robin Milner,et al. A Calculus of Communicating Systems , 1980, Lecture Notes in Computer Science.

[2] David Park,et al. Concurrency and Automata on Infinite Sequences , 1981, Theoretical Computer Science.

[3] Kim G. Larsen,et al. Bisimulation through Probabilistic Testing , 1991, Inf. Comput..

[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[6] Radha Jagadeesan,et al. Metrics for Labeled Markov Systems , 1999, CONCUR.

[7] J. Worrell,et al. Towards Quantitative Verification of Probabilistic Transition Systems , 2001, ICALP.

[8] James Worrell,et al. An Algorithm for Quantitative Verification of Probabilistic Transition Systems , 2001, CONCUR.

[9] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[10] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.

[11] Doina Precup,et al. Metrics for Markov Decision Processes with Infinite State Spaces , 2005, UAI.

[12] Doina Precup,et al. Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.