Rover-IRL: Inverse Reinforcement Learning With Soft Value Iteration Networks for Planetary Rover Path Planning

Planetary rovers, such as those currently on Mars, face difficult path planning problems, both before landing during the mission planning stages as well as once on the ground. In this work, we present a new approach to these planning problems based on inverse reinforcement learning using deep convolutional networks and value iteration networks (VIN) as important internal structures. VIN are an approximation of the value iteration (VI) algorithm implemented with convolutional neural networks to make VI fully differentiable. We propose a modification to the value iteration recurrence, referred to as the soft value iteration network (SVIN). SVIN is designed to produce more effective training gradients through the VIN. It relies on an internal soft policy model, where the policy is represented with a probability distribution over all possible actions, rather than a deterministic policy that returns only the best action. We demonstrate the effectiveness of our proposed architecture in both a grid world dataset as well as a highly realistic synthetic dataset generated from currently deployed rover mission planning tools and real Mars imagery.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Ryan Mackey,et al.  Productivity challenges for Mars rover operations , 2016 .

[3]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[4]  A. Huertas,et al.  Detection and Characterization of Rocks and Rock Size-Frequency Distributions at the Final Four Mars Science Laboratory Landing Sites , 2012 .

[5]  Markus Wulfmeier,et al.  Watch this: Scalable cost-function learning for path planning in urban environments , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[7]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[8]  David Hsu,et al.  QMDP-Net: Deep Learning for Planning under Partial Observability , 2017, NIPS.

[9]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[10]  Jürgen Schmidhuber,et al.  A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[11]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Masahiro Ono,et al.  Data-driven surface traversability analysis for Mars 2020 landing site selection , 2016, 2016 IEEE Aerospace Conference.

[14]  Benjamin J. Hockman,et al.  Stochastic Motion Planning for Hopping Rovers on Small Solar System Bodies , 2019, ISRR.

[15]  P.J. Werbos,et al.  Efficient Learning in Cellular Simultaneous Recurrent Neural Networks - The Case of Maze Navigation Problem , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[16]  David Silver,et al.  Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain , 2010, Int. J. Robotics Res..

[17]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[18]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[19]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[20]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[21]  David M. Bradley,et al.  Learning for Autonomous Navigation , 2010, IEEE Robotics & Automation Magazine.

[22]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.