Multitask model-free reinforcement learning

Multitask model-free reinforcement learning Andrew Saxe Stanford University, Stanford, CA, USA Abstract: Conventional model-free reinforcement learning algorithms are limited to performing only one task, such as navi- gating to a single goal location in a maze, or reaching one goal state in the Tower of Hanoi block manipulation problem. It has been thought that only model-based algorithms could perform goal-directed actions, optimally adapting to new reward struc- tures in the environment. In this work, we develop a new model-free algorithm capable of learning about many different tasks simultaneously, and mixing these together to perform novel, never-before-seen tasks. Our algorithm has the crucial property that, when performing a blend of previously learned tasks, it provably performs optimally. The algorithm learns a distributed representation of tasks, thereby avoiding the curse of dimensionality afflicting other hierarchical reinforcement learning ap- proaches like the options framework that cannot blend subtasks. This result forces a reevaluation of experimental paradigms that use goal-directed behavior to argue for model-based algorithms.