论文信息 - Asynchronous framework with Reptile+ algorithm to meta learn partially observable Markov decision process - 字舞流文

Asynchronous framework with Reptile+ algorithm to meta learn partially observable Markov decision process

Meta-learning has recently received much attention in a wide variety of deep reinforcement learning (DRL). In non-meta-learning, we have to train a deep neural network as a controller to learn a specific control task from scratch using a large amount of data. This way of training has shown many limitations in handling different related tasks. Therefore, meta-learning on control domains becomes a powerful tool for transfer learning on related tasks. However, it is widely known that meta-learning requires massive computation and training time. This paper will propose a novel DRL framework, which is called HCGF-R2-DDPG (Hybrid CPU/GPU Framework for Reptile+ and Recurrent Deep Deterministic Policy Gradient). HCGF-R2-DDPG will integrate meta-learning into a general asynchronous training architecture. The proposed framework will allow utilising both CPU and GPU to boost the training speed for the meta network initialisation. We will evaluate HCGF-R2-DDPG on various Partially Observable Markov Decision Process (POMDP) domains.

Viet-Hung Dang | TaeChoong Chung | Dang Quang Nguyen | Ngo Anh Vien

[1] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[2] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[3] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[4] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.

[5] Stephen Tyree,et al. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[6] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[7] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[8] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[9] Amos J. Storkey,et al. How to train your MAML , 2018, ICLR.

[10] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[11] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.

[12] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[13] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[14] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[15] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[16] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[17] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[18] Pieter Abbeel,et al. Some Considerations on Learning to Explore via Meta-Reinforcement Learning , 2018, ICLR 2018.

[19] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[20] David Silver,et al. Memory-based control with recurrent neural networks , 2015, ArXiv.

[21] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[22] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.