Deep Reinforcement Learning With Optimized Reward Functions for Robotic Trajectory Planning

To improve the efficiency of deep reinforcement learning (DRL)-based methods for robotic trajectory planning in the unstructured working environment with obstacles. Different from the traditional sparse reward function, this paper presents two brand-new dense reward functions. First, the azimuth reward function is proposed to accelerate the learning process locally with a more reasonable trajectory by modeling the position and orientation constraints, which can reduce the blindness of exploration dramatically. To further improve the efficiency, a reward function at subtask-level is proposed to provide global guidance for the agent in the DRL. The subtask-level reward function is designed under the assumption that the task can be divided into several subtasks, which reduces the invalid exploration greatly. The extensive experiments show that the proposed reward functions are able to improve the convergence rate by up to three times with the state-of-the-art DRL methods. The percentage increase in convergence means is 2.25%–13.22% and the percentage decreases with respect to standard deviation by 10.8%–74.5%.

[1]  Jonathan P. How,et al.  Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Sergey Levine,et al.  Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Chaofeng Wang,et al.  Reinforcement Learning-Based Multi-AUV Adaptive Trajectory Planning for Under-Ice Field Estimation , 2018, Sensors.

[4]  Gabriel Andrei,et al.  Optimal path planning for a new type of 6RSS parallel robot based on virtual displacements expressed through Hermite polynomials , 2018, Mechanism and Machine Theory.

[5]  Nanning Zheng,et al.  Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Kaiqi Huang,et al.  Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  S. LaValle Rapidly-exploring random trees : a new tool for path planning , 1998 .

[8]  Khoi Nguyen,et al.  From motion planning to trajectory control with bounded jerk for service manipulator robots , 2010, 2010 IEEE International Conference on Robotics and Automation.

[9]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[10]  N. Peric,et al.  A reinforcement learning approach to obstacle avoidance of mobile robots , 2002, 7th International Workshop on Advanced Motion Control. Proceedings (Cat. No.02TH8623).

[11]  Qi Zhang,et al.  Person Re-Identification With Triplet Focal Loss , 2018, IEEE Access.

[12]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[13]  Yuichiro Yoshikawa,et al.  Robot gains social intelligence through multimodal deep reinforcement learning , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[14]  John Folkesson,et al.  Deep Reinforcement Learning to Acquire Navigation Skills for Wheel-Legged Robots in Complex Environments , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[17]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[18]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Nasser Mozayani,et al.  A new Potential-Based Reward Shaping for Reinforcement Learning Agent , 2019, 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC).

[20]  Peter I. Corke,et al.  Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control , 2015, ICRA 2015.

[21]  Haitao Wang,et al.  Deep reinforcement learning with experience replay based on SARSA , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  J. C. Alexander,et al.  On the Kinematics of Wheeled Mobile Robots , 1990, Autonomous Robot Vehicles.

[24]  Zhian Zhang,et al.  Dynamic Path Planning of Unknown Environment Based on Deep Reinforcement Learning , 2018, J. Robotics.

[25]  P. Siarry,et al.  A trajectory planning of redundant manipulators based on bilevel optimization , 2015, Appl. Math. Comput..

[26]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[27]  Tai Lei A Robot Exploration Strategy Based on Q-learning Network , 2016 .

[28]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[29]  Surya P. N. Singh,et al.  V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  I-Jeng Wang,et al.  Leveraging Deep Reinforcement Learning for Reaching Robotic Tasks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).