Path planning for asteroid hopping rovers with pre-trained deep reinforcement learning architectures

Abstract Asteroid surface exploration is challenging due to complex terrain topology and irregular gravity field. A hopping rover is considered as a promising mobility solution to explore the surface of small celestial bodies. Conventional path planning tasks, such as traversing a given map to reach a known target, may become particularly challenging for hopping rovers if the terrain displays sufficiently complex 3-D structures. As an alternative to traditional path-planning approaches, this work explores the possibility of applying deep reinforcement learning (DRL) to plan the path of a hopping rover across a highly irregular surface. The 3-D terrain of the asteroid surface is converted into a level matrix, which is used as an input of the reinforcement learning algorithm. A deep reinforcement learning architecture with good convergence and stability properties is presented to solve the rover path-planning problem. Numerical simulations are performed to validate the effectiveness and robustness of the proposed method with applications to two different types of 3-D terrains.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[3]  Richard Linares,et al.  Seeker based Adaptive Guidance via Reinforcement Meta-Learning Applied to Asteroid Close Proximity Operations , 2019, ArXiv.

[4]  Liuqing Yang,et al.  Where does AlphaGo go: from church-turing thesis to AlphaGo thesis and beyond , 2016, IEEE/CAA Journal of Automatica Sinica.

[5]  Richard P. Binzel,et al.  MUSES‐C target asteroid (25143) 1998 SF36: A reddened ordinary chondrite , 2001 .

[6]  WeiRen Wu,et al.  Investigation on the development of deep space exploration , 2012 .

[7]  Jianqiao Yu,et al.  UAV path planning using artificial potential field method updated by optimal control theory , 2016, Int. J. Syst. Sci..

[8]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[9]  Anish Pandey,et al.  Path planning navigation of mobile robot with obstacles avoidance using fuzzy logic controller , 2014, 2014 IEEE 8th International Conference on Intelligent Systems and Control (ISCO).

[10]  J. Mankins,et al.  Toward a global space exploration program: A stepping stone approach , 2012 .

[11]  Mohamed Elhoseny,et al.  Bezier Curve Based Path Planning in a Dynamic Field using Modified Genetic Algorithm , 2017, J. Comput. Sci..

[12]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[13]  H. Osmaston,et al.  Estimates of glacier equilibrium line altitudes by the Area×Altitude, the Area×Altitude Balance Ratio and the Area×Altitude Balance Index methods and their validation , 2005 .

[14]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[15]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[16]  Takashi Kubota,et al.  Micro-hopping robot for asteroid exploration , 2003 .

[17]  Daniel J. Scheeres,et al.  Surface Gravity Fields for Asteroids and Comets , 2013 .

[18]  Valerio Carruba,et al.  DETECTION OF THE YORP EFFECT FOR SMALL ASTEROIDS IN THE KARIN CLUSTER , 2016, 1603.09612.

[19]  Dario Izzo,et al.  A survey on artificial intelligence trends in spacecraft guidance dynamics and control , 2018, Astrodynamics.

[20]  Michel Tokic,et al.  Adaptive epsilon-Greedy Exploration in Reinforcement Learning Based on Value Difference , 2010, KI.

[21]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[22]  Alma Y. Alanis,et al.  Rough Terrain Perception Through Geometric Entities for Robot Navigation , 2013, CSE 2013.

[23]  Junqiang Xi,et al.  Self‐supervised learning to visually detect terrain surfaces for autonomous robots operating in forested terrain , 2012, J. Field Robotics.

[24]  Frances Westall,et al.  Astrobiology and the Possibility of Life on Earth and Elsewhere… , 2015, Space Science Reviews.

[25]  Sergei Vassilvitskii,et al.  Fast greedy algorithms in mapreduce and streaming , 2013, SPAA.

[26]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[27]  Y. Tsuda,et al.  Hayabusa2 Mission Overview , 2017 .

[28]  M. Pavone,et al.  Spacecraft/rover hybrids for the exploration of small Solar System bodies , 2013, 2013 IEEE Aerospace Conference.

[29]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Thomas C. Duxbury,et al.  Spacecraft exploration of Phobos and Deimos , 2014 .

[31]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[32]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[33]  J. Dotson,et al.  A probabilistic asteroid impact risk model: assessment of sub-300 m impacts , 2017 .

[34]  Quoc V. Le,et al.  Neural Optimizer Search with Reinforcement Learning , 2017, ICML.

[35]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.