Distributed Continuous Control with Meta Learning on Robotic Arms

Deep reinforcement learning has been proposed to train the control agent for robotic arms, such as Deep Q-Learning (DQN) and Policy Gradient (PG). The approach of Deterministic Deep Policy Gradient (DDPG) takes the advantage of deterministic policy instead of stochastic policy to further simplify the training process and improve the performance. Reinforcement Learning takes the reward from the environment and trains the underlying control agent to achieve the task. An appropriate reward will get better performance and shorter training time, but it requires the domain knowledge and the method of trial and error to define the appropriate reward function. In this paper, we proposed a method that is based on DDPG and makes use of Prioritized Experience Replay (PER), Asynchronous Agent Learning and Meta Learning. The proposed Meta Learning approach uses multiple distributed learners, called workers, to learn from consecutive previous states and rewards. Simulations are done on 6-DOF (IRB140) and 7-DOF (LBR iiwa 14 R820) robotic arms to train the control agents to reach random targets in the three dimension space. The experiments show that the algorithm we proposed is better than the algorithm using DDPG with specialized reward function on the task success rate and the training speed.