In this work a novel approach to Transfer Learning for the use in Deep Reinforcement Learning is introduced. The agent is realized as an actor-critic framework, namely the Deep Deterministic Policy Gradient algorithm. The Q-function and the policy are represented as deep feed-forward networks, that are trained by minimizing the mean squared Bellman error and by maximizing the expected reward, respectively. For Transfer Learning, the actor is modified with a new regularization term, called the knowledge regularizer. It allows to include prior knowledge in from of an existing policy in the learning process. The knowledge regularizer shifts the current weight vector during the gradient descent step towards a region of the weight space, that is centered around the existing policy. Because neural networks are universal and smooth function approximators, the weights of the existing policy and the new ones have to lie close to each other in the weight space. Solving a task therefore benefits from the prior knowledge, when it is used to manipulate the gradient given by the critic. We could experimentally verify, that the knowledge regularizer results in a higher performance achieved by the agent and in a reduction of the learning time. Furthermore, the knowledge regularizer can be used as a replacement for labeled training data, which renders it especially useful for physical applications.
[1]
Leslie Pack Kaelbling,et al.
Effective reinforcement learning for mobile robots
,
2002,
Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).
[2]
Peter Stone,et al.
Transfer Learning for Reinforcement Learning Domains: A Survey
,
2009,
J. Mach. Learn. Res..
[3]
Yuval Tassa,et al.
Continuous control with deep reinforcement learning
,
2015,
ICLR.
[4]
K. R. Dixon,et al.
Incorporating Prior Knowledge and Previously Learned Information into Reinforcement Learning Agents
,
2000
.
[5]
Wojciech Zaremba,et al.
OpenAI Gym
,
2016,
ArXiv.
[6]
Richard S. Sutton,et al.
Reinforcement Learning: An Introduction
,
1998,
IEEE Trans. Neural Networks.
[7]
Frank Kirchner,et al.
Incremental learning of skill collections based on intrinsic motivation
,
2013,
Front. Neurorobot..