AIMS CDT Project Report : Towards One-Shot Learning From Demonstration via Reinforcement Learning

We explore meta-learning algorithms and architectures for use in one-shot learning from demonstration via reinforcement learning. We provide evidence that REPTILE does not work effectively at meta-learning in reinforcement learning environments and present preliminary findings on the effectiveness of GRUs at ‘fast adaptation’ to tasks in reinforcement learning environments.

[1]  Xi Chen,et al.  Learning From Demonstration in the Wild , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[2]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[3]  Pieter Abbeel,et al.  Some Considerations on Learning to Explore via Meta-Reinforcement Learning , 2018, ICLR 2018.

[4]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[5]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[6]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[7]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[8]  Michael I. Jordan,et al.  Trust Region Policy Optimization , 2015, ICML.

[9]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[10]  A. Thomaz,et al.  Robot Learning from Human Teachers , 2014, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[11]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[13]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.