A Novel Heterogeneous Actor-critic Algorithm with Recent Emphasizing Replay Memory

Reinforcement learning (RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However, the training efficiency and performance of such methods limit further applications. In this paper, we propose an off-policy heterogeneous actor-critic (HAC) algorithm, which contains soft Q-function and ordinary Q-function. The soft Q-function encourages the exploration of a Gaussian policy, and the ordinary Q-function optimizes the mean of the Gaussian policy to improve the training efficiency. Experience replay memory is another vital component of off-policy RL methods. We propose a new sampling technique that emphasizes recently experienced transitions to boost the policy training. Besides, we integrate HAC with hindsight experience replay (HER) to deal with sparse reward tasks, which are common in the robotic manipulation domain. Finally, we evaluate our methods on a series of continuous control benchmark tasks and robotic manipulation tasks. The experimental results show that our method outperforms prior state-of-the-art methods in terms of training efficiency and performance, which validates the effectiveness of our method.

[1]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Zhengcai Cao,et al.  Learning Locomotion Skills via Model-based Proximal Meta-Reinforcement Learning , 2019, 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC).

[3]  Xinghu Yu,et al.  Controller Optimization for Multirate Systems Based on Reinforcement Learning , 2020, Int. J. Autom. Comput..

[4]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Guohui Tian,et al.  Learning to Transform Service Instructions into Actions with Reinforcement Learning and Knowledge Base , 2018, Int. J. Autom. Comput..

[7]  Chun Yuan,et al.  Self-Adaptive Double Bootstrapped DDPG , 2018, IJCAI.

[8]  Rui Wang,et al.  Multi-critic DDPG Method and Double Experience Replay , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[9]  Eiji Uchibe,et al.  Cooperative and Competitive Reinforcement and Imitation Learning for a Mixture of Heterogeneous Learning Modules , 2018, Front. Neurorobot..

[10]  Daochen Zha,et al.  Experience Replay Optimization , 2019, IJCAI.

[11]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.