Deep reinforcement learning with successor features for navigation across similar environments

In this paper we consider the problem of robot navigation in simple maze-like environments where the robot has to rely on its onboard sensors to perform the navigation task. In particular, we are interested in solutions to this problem that do not require localization, mapping or planning. Additionally, we require that our solution can quickly adapt to new situations (e.g., changing navigation goals and environments). To meet these criteria we frame this problem as a sequence of related reinforcement learning tasks. We propose a successor-feature-based deep reinforcement learning algorithm that can learn to transfer knowledge from previously mastered navigation tasks to new problem instances. Our algorithm substantially decreases the required learning time after the first task instance has been solved, which makes it easily adaptable to changing environments. We validate our method in both simulated and real robot experiments with a Robotino and compare it to a set of baseline methods including classical planning-based navigation.

[1]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[2]  Wolfram Burgard,et al.  A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation , 2016 .

[3]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[5]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[8]  Ming Liu,et al.  Towards Cognitive Exploration through Deep Reinforcement Learning for Mobile Robots , 2016, ArXiv.

[9]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[10]  Peter Stone,et al.  An Introduction to Intertask Transfer for Reinforcement Learning , 2011, AI Mag..

[11]  Samuel Gershman,et al.  Deep Successor Reinforcement Learning , 2016, ArXiv.

[12]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[14]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[15]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Jean-Claude Latombe,et al.  Robot motion planning , 1970, The Kluwer international series in engineering and computer science.

[17]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[18]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[19]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[20]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Ming Liu,et al.  Deep-learning in Mobile Robotics - from Perception to Control Systems: A Survey on Why and Why not , 2016, ArXiv.

[23]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[24]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[25]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[26]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[27]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[28]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[29]  Oliver Brock,et al.  Learning state representations with robotic priors , 2015, Auton. Robots.

[30]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[31]  Martin A. Riedmiller,et al.  Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[32]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[33]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[34]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.