论文信息 - Effective Policy Adjustment via Meta-Learning for Complex Manipulation Tasks

Effective Policy Adjustment via Meta-Learning for Complex Manipulation Tasks

The ability of adjusting policy is the key to learning decision making when completing complex manipulation tasks for agents. To solve this problem with the consideration of both exploration and exploitation, we propose a novel deep reinforcement learning algorithm by combining the Hindsight Experience Replay (HER) with the Model-Agnostic Meta-Learning (MAML). To solve the complex manipulation tasks, HER could provide a relatively effective exploration by converting the single-goal task to the multiple goals in such an environment where rewards are sparse and binary, enhancing the ability to search better policies according to not only the successful the transition trajectories but also the failures, and the MAML could promote the ability of exploitation, which means the proposed algorithm could learn faster and adjust the policy model from limited experience within few iterations. Plenty of simulation results on the complex tasks of manipulating objects with a robotic arm have been done, and results show that HER integrated with MAML could accelerate fine-tuning for the original policy gradient reinforcement learning with neural network policy, and also improve the performance on the success rate.

Kuangrong Hao | Tong Wang | Xin Cai | Binghong Wu | Xuesong Tang

[1] Jianfeng Gao,et al. Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[4] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[5] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[6] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8] Yoshua Bengio,et al. On the Optimization of a Synaptic Learning Rule , 2007 .

[9] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[10] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[11] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[12] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[14] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[15] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.