Asynchronous framework with Reptile+ algorithm to meta learn partially observable Markov decision process

Meta-learning has recently received much attention in a wide variety of deep reinforcement learning (DRL). In non-meta-learning, we have to train a deep neural network as a controller to learn a specific control task from scratch using a large amount of data. This way of training has shown many limitations in handling different related tasks. Therefore, meta-learning on control domains becomes a powerful tool for transfer learning on related tasks. However, it is widely known that meta-learning requires massive computation and training time. This paper will propose a novel DRL framework, which is called HCGF-R2-DDPG (Hybrid CPU/GPU Framework for Reptile+ and Recurrent Deep Deterministic Policy Gradient). HCGF-R2-DDPG will integrate meta-learning into a general asynchronous training architecture. The proposed framework will allow utilising both CPU and GPU to boost the training speed for the meta network initialisation. We will evaluate HCGF-R2-DDPG on various Partially Observable Markov Decision Process (POMDP) domains.

[1]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[2]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[3]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[4]  Matthew W. Hoffman,et al.  Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.

[5]  Stephen Tyree,et al.  Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[6]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[7]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[8]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[9]  Amos J. Storkey,et al.  How to train your MAML , 2018, ICLR.

[10]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[11]  Marc G. Bellemare,et al.  The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.

[12]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[13]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[14]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[15]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[16]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[17]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[18]  Pieter Abbeel,et al.  Some Considerations on Learning to Explore via Meta-Reinforcement Learning , 2018, ICLR 2018.

[19]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[20]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[21]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[22]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.