Learning Reusable Options for Multi-Task Reinforcement Learning

Reinforcement learning (RL) has become an increasingly active area of research in recent years. Although there are many algorithms that allow an agent to solve tasks efficiently, they often ignore the possibility that prior experience related to the task at hand might be available. For many practical applications, it might be unfeasible for an agent to learn how to solve a task from scratch, given that it is generally a computationally expensive process; however, prior experience could be leveraged to make these problems tractable in practice. In this paper, we propose a framework for exploiting existing experience by learning reusable options. We show that after an agent learns policies for solving a small number of problems, we are able to use the trajectories generated from those policies to learn reusable options that allow an agent to quickly learn how to solve novel and related problems.

[1]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[2]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[3]  Juergen Schmidhuber,et al.  On learning how to learn learning strategies , 1994 .

[4]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[5]  Sridhar Mahadevan,et al.  Proto-value functions: developmental reinforcement learning , 2005, ICML.

[6]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[7]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[8]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[9]  Eric Eaton,et al.  Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment , 2015, AAAI.

[10]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[11]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[12]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[13]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[14]  Doina Precup,et al.  When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.

[15]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[16]  Patrick MacAlpine,et al.  Humanoid robots learning to walk faster: from the real world to simulation and back , 2013, AAMAS.

[17]  R. Sutton,et al.  Macro-Actions in Reinforcement Learning: An Empirical Analysis , 1998 .