PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning

We present PRM-RL, a hierarchical method for long-range navigation task completion that combines sampling-based path planning with reinforcement learning (RL). The RL agents learn short-range, point-to-point navigation policies that capture robot dynamics and task constraints without knowledge of the large-scale topology. Next, the sampling-based planners provide roadmaps which connect robot configurations that can be successfully navigated by the RL agent. The same RL agents are used to control the robot under the direction of the planning, enabling long-range navigation. We use the Probabilistic Roadmaps (PRMs) for the sampling-based planner. The RL agents are constructed using feature-based and deep neural net policies in continuous state and action spaces. We evaluate PRM-RL, both in simulation and on-robot, on two navigation tasks with non-trivial robot dynamics: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments with load displacement constraints. Our results show improvement in task completion over both RL agents on their own and traditional sampling-based planners. In the indoor navigation task, PRM-RL successfully completes up to 215 m long trajectories under noisy sensor conditions, and the aerial cargo delivery completes flights over 1000 m without violating the task constraints in an environment 63 million times larger than used in training.

[1]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[2]  Steven M. LaValle,et al.  Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .

[3]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[4]  Jean-Claude Latombe,et al.  Randomized Kinodynamic Motion Planning with Moving Obstacles , 2002, Int. J. Robotics Res..

[5]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[6]  Jae-Bok Song,et al.  Path Planning for a Robot Manipulator based on Probabilistic Roadmap and Reinforcement Learning , 2007 .

[7]  Nancy M. Amato,et al.  A framework for planning motion in environments with moving obstacles , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Thierry Siméon,et al.  The Stochastic Motion Roadmap: A Sampling Framework for Planning with Markov Motion Uncertainty , 2007, Robotics: Science and Systems.

[9]  Lydia Tapia,et al.  A Motion Planning Approach to Studying Molecular Motions , 2010, Commun. Inf. Syst..

[10]  Jan Peters,et al.  A biomimetic approach to robot table tennis , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Lars Grne,et al.  Nonlinear Model Predictive Control: Theory and Algorithms , 2011 .

[12]  Michael L. Littman,et al.  Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.

[13]  Rafael Fierro,et al.  Agile Load Transportation : Safe and Efficient Load Manipulation with Aerial Robots , 2012, IEEE Robotics & Automation Magazine.

[14]  Lydia Tapia,et al.  Implementation of an embodied general reinforcement learner on a serial link manipulator , 2012, 2012 IEEE International Conference on Robotics and Automation.

[15]  Thierry Siméon,et al.  Motion planning algorithms for molecular simulations: A survey , 2012, Comput. Sci. Rev..

[16]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[17]  Lydia Tapia,et al.  Continuous action reinforcement learning for control-affine systems with unknown dynamics , 2014, IEEE/CAA Journal of Automatica Sinica.

[18]  Lydia Tapia,et al.  Efficient Motion-based Task Learning for a Serial Link Manipulator , 2014 .

[19]  Lydia Tapia,et al.  Reinforcement learning for balancing a flying inverted pendulum , 2014, Proceeding of the 11th World Congress on Intelligent Control and Automation.

[20]  Nancy M. Amato,et al.  FIRM: Sampling-based feedback motion-planning under motion uncertainty and imperfect measurements , 2014, Int. J. Robotics Res..

[21]  Lydia Tapia,et al.  Preference-balancing motion planning under stochastic disturbances , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[24]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[25]  Fernando Diaz,et al.  Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains , 2016, ArXiv.

[26]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[27]  Sergey Levine,et al.  Collective robot reinforcement learning with distributed asynchronous guided policy search , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28]  Lydia Tapia,et al.  Automated aerial suspended cargo delivery through reinforcement learning , 2017, Artif. Intell..