Path Planning of Humanoid Arm Based on Deep Deterministic Policy Gradient

The robot arm with multiple degrees of freedom and working in a 3D space needs to avoid obstacles during the grasping process by its end effector. Path planning to avoid obstacles is very important for accomplishing a grasping task. This paper proposes a new obstacle avoidance algorithm, based on an existing deep reinforcement learning framework called deep deterministic policy gradient (DDPG). Specifically, we propose to use DDPG to plan the trajectory of a robot arm to realize obstacle avoidance. The rewards are designed to overcome the difficulty in convergence of multiple rewards, especially when the rewards are antagonistic with respect to each other. Obstacle avoidance of the robot arm using DDPG is achieved by self-learning, and the convergence problem caused by the high dimension state input and multiple return values is solved. The simulation model of an arm of the Nao robot is built based on the MuJoCo simulation environment. The simulation demonstrates that the proposed algorithm successfully allows the robot arm to avoid obstacles.

[1]  Carlos E. Castañeda,et al.  Real-Time Decentralized Neural Control via Backstepping for a Robotic Arm Powered by Industrial Servomotors , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Ary Setijadi Prihatmanto,et al.  Design and implementation of kinematics model and trajectory planning for NAO humanoid robot in a tic-tac-toe board game , 2014, 2014 IEEE 4th International Conference on System Engineering and Technology (ICSET).

[3]  Mohammad Haeri,et al.  Finite time control of robotic manipulators with position output feedback , 2017 .

[4]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[5]  Tetsuya Ogata,et al.  Developmental Human-Robot Imitation Learning of Drawing with a Neuro Dynamical System , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[6]  Zhijing Zhang,et al.  Study on the Accuracy of a 6-DOF Flexure Hinge-Based Robot , 2011 .

[7]  Ankita Ranjan,et al.  Identification and control of NAO humanoid robot to grasp an object using monocular vision , 2017, 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT).

[8]  Kaspar Althoefer,et al.  Reinforcement learning in a rule-based navigator for robotic manipulators , 2001, Neurocomputing.

[9]  John E. W. Mayhew,et al.  Obstacle Avoidance through Reinforcement Learning , 1991, NIPS.

[10]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[11]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[12]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[13]  T.E.P. van der Wal,et al.  Object Grasping with the NAO , 2012 .

[14]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[15]  H. Kappen An introduction to stochastic control theory, path integrals and reinforcement learning , 2007 .

[16]  Stefan Wermter,et al.  Real-World Reinforcement Learning for Autonomous Humanoid Robot Charging in a Home Environment , 2011, TAROS.

[17]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[18]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[20]  Matthew W. Hoffman,et al.  Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.

[21]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).