Knowledge Transfer in Markov Decision Processes

Markov Decision Processes (MDPs) are an effective way to for mulate many problems in Machine Learning. However, learning the optimal policy for an MDP ca n be a time-consuming process, especially when nothing is known about the policy to begin with. An alter native approach is to find a similar MDP, for which an optimal policy is known, and modify this policy a s needed. We present a framework for measuring the quality of knowledge transfer when transferr ing policies from one MDP to another. Our formulation is based upon the use of MDP bisimulation metric s, which provide a stable quantitative notion of state similarity for MDPs. Given two MDPs and a stat e mapping from the first to the second, a policy defined on the latter naturally induces a policy on th e former. We provide a bound on the value function of the induced policy, showing that if the two MDPs are behaviorally close in terms of bisimulation distance and the original policy is close to op timal then the induced policy is guaranteed to be close to optimal as well. We also present some experiments in which simple MDPs are used to test the tightness of the bound provided by the bisimulation distanc e. In light of the results of these experiments, we suggest a new similarity measure.