Reusing Learned Policies Between Similar Problems

We are interested in being able to leverage policy learning i n complex problems upon policies learned for similar problems. This c apability is particularly important in robot learning, where gathering dat a is expensive and time-consuming, and prohibits directly applying reinforc ement learning. In this case, we would like to be able to transfer knowledge from a simulator, which may have an inaccurate or crude model of the robot and en vironment. We observed that when applying a policy learned in a simulato r, some parts of the policy effectively apply to the real robots while othe r parts do not. We then explored learning a complex problem by reusing only p arts of the solutions of similar problems. Empirical experiments of le arning when part of the policy is fixed show that the complete task is learned fa ster, but the resulting policy is suboptimal. One of the main contributio ns of this paper is a theorem and its proof, which states the degree of suboptima lity of a policy that is fixed over a subproblem, can be determined without the need for optimally solving the complete problem. We formally define a subp ro lem and build upon the value equivalence of the boundary states of th e subproblem to prove the bound on suboptimality.