Learning Navigation Behaviors End to End

A longstanding goal of behavior-based robotics is to solve high-level navigation tasks using end to end navigation behaviors that directly map sensors to actions. Navigation behaviors, such as reaching a goal or following a path without collisions, can be learned from exploration and interaction with the environment, but are constrained by the type and quality of a robot’s sensors, dynamics, and actuators. Traditional motion planning handles varied robot geometry and dynamics, but typically assumes high-quality observations. Modern vision-based navigation typically considers imperfect or partial observations, but simplifies the robot action space. With both approaches, the transition from simulation to reality can be difficult. Here, we learn two end to end navigation behaviors that avoid moving obstacles: point to point and path following. These policies receive noisy lidar observations and output robot linear and angular velocities. We train these policies in small, static environments with Shaped-DDPG, an adaptation of the Deep Deterministic Policy Gradient (DDPG) reinforcement learning method which optimizes reward and network architecture. Over 500 meters of on-robot experiments show , these policies generalize to new environments and moving obstacles, are robust to sensor, actuator, and localization noise, and can serve as robust building blocks for larger navigation tasks. The path following and point and point policies are 83% and 56% more successful than the baseline, respectively.

[1]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[2]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[3]  Wolfram Burgard,et al.  The dynamic window approach to collision avoidance , 1997, IEEE Robotics Autom. Mag..

[4]  Vali Derhami,et al.  Supervised fuzzy reinforcement learning for robot navigation , 2016, Appl. Soft Comput..

[5]  Daniel Kudenko,et al.  Multigrid Reinforcement Learning with Reward Shaping , 2008, ICANN.

[6]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[7]  Lydia Tapia,et al.  Path-guided artificial potential fields with stochastic reachable sets for motion planning in highly dynamic environments , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[8]  David Vandyke,et al.  Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems , 2015, SIGDIAL Conference.

[9]  Hongyan Wang,et al.  Social potential fields: A distributed behavioral control for autonomous robots , 1995, Robotics Auton. Syst..

[10]  Wendelin Bohmer Robot Navigation using Reinforcement Learning and Slow Feature Analysis , 2012 .

[11]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  O. Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[14]  E. Imal,et al.  Reinforcement learning-based mobile robot navigation , 2016 .

[15]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[16]  Kurt Konolige,et al.  The Office Marathon: Robust navigation in an indoor office environment , 2010, 2010 IEEE International Conference on Robotics and Automation.

[17]  Benjamin Schrauwen,et al.  On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Computing Architectures , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Nikolaus Hansen,et al.  On the Adaptation of Arbitrary Normal Mutation Distributions in Evolution Strategies: The Generating Set Adaptation , 1995, ICGA.

[19]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[20]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[21]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Daniel King,et al.  Fetch & Freight : Standard Platforms for Service Robot Applications , 2016 .

[23]  Alois Knoll,et al.  Hierarchical Reinforcement Learning for Robot Navigation , 2013, ESANN.

[24]  Lydia Tapia,et al.  PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Alan Fern,et al.  Imitation Learning with Demonstrations and Shaping Rewards , 2014, AAAI.

[26]  Kaiyu Zheng,et al.  ROS Navigation Tuning Guide , 2017, Studies in Computational Intelligence.

[27]  William D. Smart,et al.  Layered costmaps for context-sensitive navigation , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  S. LaValle,et al.  Randomized Kinodynamic Planning , 2001 .

[29]  Lydia Tapia,et al.  Safety, Challenges, and Performance of Motion Planners in Dynamic Environments , 2017, ISRR.

[30]  Lydia Tapia,et al.  Avoiding moving obstacles with stochastic hybrid dynamics using PEARL: PrEference Appraisal Reinforcement Learning , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Hava T. Siegelmann,et al.  Robust artificial life via artificial programmed death , 2008, Artif. Intell..

[32]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[33]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[34]  Jana Kosecka,et al.  Visual Representations for Semantic Target Driven Navigation , 2018, 2019 International Conference on Robotics and Automation (ICRA).