Action-Space Knowledge Transfer in MDP ’ s : Formalism , Suboptimality Bounds , and Algorithms ?

Temporal-difference reinforcement learning (RL) has been successfully applied in several domains with large statesets. Largeaction sets, however, have received considerably less attention. This pape r studies the use of knowledge transfer between related tasks to accelerate lea rning with large action sets. We introduce action transfer , a technique that extracts the actions from the (near-)optimal solution to the first task and uses them in place of the full action set when learning any subsequent tasks. When optimal ac tions make up a small fraction of the domain’s action set, action transfer c an substantially reduce the number of actions and thus the complexity of the problem. However, action transfer betweendissimilar tasks can be detrimental. We present a novel formalism of related tasks and use it to derive a bound on the subopti mality of action transfer. We additionally bound action-transfer suboptim al ty in generic MDP’s and analyze the feasibility of provably reliableaction transfer. We build on this analysis to propose randomized task perturbation (RTP), an enhancement to action transfer that makes it robust to unrepresentative sour ce tasks. The empirical results in this paper show the potential of RTP action transf er to substantially expand the applicability of RL to problems with large action sets.