Learning-Based End-to-End Path Planning for Lunar Rovers with Safety Constraints

Path planning is an essential technology for lunar rover to achieve safe and efficient autonomous exploration mission, this paper proposes a learning-based end-to-end path planning algorithm for lunar rovers with safety constraints. Firstly, a training environment integrating real lunar surface terrain data was built using the Gazebo simulation environment and a lunar rover simulator was created in it to simulate the real lunar surface environment and the lunar rover system. Then an end-to-end path planning algorithm based on deep reinforcement learning method is designed, including state space, action space, network structure, reward function considering slip behavior, and training method based on proximal policy optimization. In addition, to improve the generalization ability to different lunar surface topography and different scale environments, a variety of training scenarios were set up to train the network model using the idea of curriculum learning. The simulation results show that the proposed planning algorithm can successfully achieve the end-to-end path planning of the lunar rover, and the path generated by the proposed algorithm has a higher safety guarantee compared with the classical path planning algorithm.

[1]  Kandyce Goodliff,et al.  The Artemis Program: An Overview of NASA's Activities to Return Humans to the Moon , 2020, 2020 IEEE Aerospace Conference.

[2]  Richard Dazeley,et al.  Deep Reinforcement Learning with Interactive Feedback in a Human-Robot Environment , 2020, ArXiv.

[3]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Kazuya Yoshida,et al.  Path Planning and Evaluation for Planetary Rovers Based on Dynamic Mobility Index , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Yanbin Gao,et al.  Vision-Based Robot Navigation through Combining Unsupervised Learning and Hierarchical Reinforcement Learning , 2019, Sensors.

[8]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[9]  Gaurav S. Sukhatme,et al.  Rover-IRL: Inverse Reinforcement Learning With Soft Value Iteration Networks for Planetary Rover Path Planning , 2018, IEEE Robotics and Automation Letters.

[10]  Mircea-Bogdan Radac,et al.  Robust Control of Unknown Observable Nonlinear Systems Solved as a Zero-Sum Game , 2020, IEEE Access.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Chengchao Bai,et al.  Deep Multi-Layer Perception Based Terrain Classification for Planetary Exploration Rovers , 2019, Sensors.

[13]  Masahiro Ono,et al.  Risk-aware planetary rover operation: Autonomous terrain classification and path planning , 2015, 2015 IEEE Aerospace Conference.

[14]  Xinkai Wu,et al.  A Two-Stage Method for Target Searching in the Path Planning for Mobile Robots , 2020, Sensors.

[15]  Yuanqing Xia,et al.  A Novel Learning-based Global Path Planning Algorithm for Planetary Rovers , 2018, Neurocomputing.

[16]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[17]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[18]  Reiya Takemura,et al.  Traversability-Based RRT* for Planetary Rover Path Planning in Rough Terrain with LIDAR Point Cloud Data , 2017, J. Robotics Mechatronics.

[19]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[20]  Larry H. Matthies,et al.  Terrain Adaptive Navigation for planetary rovers , 2009, J. Field Robotics.

[21]  Bin Liu,et al.  Geometric Quality Assessment of Chang'E-2 Global DEM Product , 2020, Remote. Sens..

[22]  Yong Wei,et al.  China’s present and future lunar exploration program , 2019, Science.

[23]  Ping Wang,et al.  Comprehensive Global Path Planning for Lunar Rovers , 2020, 2020 3rd International Conference on Unmanned Systems (ICUS).

[24]  Masatsugu Otsuki,et al.  The Right Path: Comprehensive Path Planning for Lunar Exploration Rovers , 2015, IEEE Robotics & Automation Magazine.

[25]  Erfu Yang,et al.  Adaptive and intelligent navigation of autonomous planetary rovers — A survey , 2017, 2017 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).

[26]  Yan Xu,et al.  Data-Driven Load Frequency Control for Stochastic Power Systems: A Deep Reinforcement Learning Method With Continuous Action Search , 2019, IEEE Transactions on Power Systems.

[27]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[28]  Shoya Higa,et al.  MAARS: Machine learning-based Analytics for Automated Rover Systems , 2020, 2020 IEEE Aerospace Conference.

[29]  W. Bluethmann,et al.  An Overview of the Volatiles Investigating Polar Exploration Rover (VIPER) Mission , 2019 .

[30]  Jing Guo,et al.  Deep Reinforcement Learning for Indoor Mobile Robot Path Planning , 2020, Sensors.