论文信息 - Distributed Continuous Control with Meta Learning on Robotic Arms

Distributed Continuous Control with Meta Learning on Robotic Arms

Deep reinforcement learning has been proposed to train the control agent for robotic arms, such as Deep Q-Learning (DQN) and Policy Gradient (PG). The approach of Deterministic Deep Policy Gradient (DDPG) takes the advantage of deterministic policy instead of stochastic policy to further simplify the training process and improve the performance. Reinforcement Learning takes the reward from the environment and trains the underlying control agent to achieve the task. An appropriate reward will get better performance and shorter training time, but it requires the domain knowledge and the method of trial and error to define the appropriate reward function. In this paper, we proposed a method that is based on DDPG and makes use of Prioritized Experience Replay (PER), Asynchronous Agent Learning and Meta Learning. The proposed Meta Learning approach uses multiple distributed learners, called workers, to learn from consecutive previous states and rewards. Simulations are done on 6-DOF (IRB140) and 7-DOF (LBR iiwa 14 R820) robotic arms to train the control agents to reach random targets in the three dimension space. The experiments show that the algorithm we proposed is better than the algorithm using DDPG with specialized reward function on the task success rate and the training speed.

Sheng-De Wang | Kuan-Ting Chen | Kuan-Ting Chen | Sheng-De Wang

[1] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[2] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[3] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[4] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[5] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[6] Steven M. LaValle,et al. Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .

[7] Daniel Dewey,et al. Reinforcement Learning and the Reward Engineering Principle , 2014, AAAI Spring Symposia.

[8] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[9] Stephen James,et al. 3D Simulation for Robot Arm Control with Deep Q-Learning , 2016, ArXiv.

[10] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[11] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[12] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[13] Christopher Joseph Pal,et al. On orthogonality and learning recurrent networks with long term dependencies , 2017, ICML.

[14] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.