Where to go Next: Learning a Subgoal Recommendation Policy for Navigation in Dynamic Environments

Robotic navigation in environments shared with other robots or humans remains challenging because the intentions of the surrounding agents are not directly observable and the environment conditions are continuously changing. Local trajectory optimization methods, such as model predictive control (MPC), can deal with those changes but require global guidance, which is not trivial to obtain in crowded scenarios. This paper proposes to learn, via deep Reinforcement Learning (RL), an interaction-aware policy that provides long-term guidance to the local planner. In particular, in simulations with cooperative and non-cooperative agents, we train a deep network to recommend a subgoal for the MPC planner. The recommended subgoal is expected to help the robot in making progress towards its goal and accounts for the expected interaction with other agents. Based on the recommended subgoal, the MPC planner then optimizes the inputs for the robot satisfying its kinodynamic and collision avoidance constraints. Our approach is shown to substantially improve the navigation performance in terms of number of collisions as compared to prior MPC frameworks, and in terms of both travel time and number of collisions compared to deep RL methods in cooperative, competitive and mixed multiagent scenarios.

[1]  Louis Wehenkel,et al.  Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Dinesh Manocha,et al.  Reciprocal n-Body Collision Avoidance , 2011, ISRR.

[3]  Sergey Levine,et al.  Variational Policy Search via Trajectory Optimization , 2013, NIPS.

[4]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[5]  Emilio Frazzoli,et al.  A Survey of Motion Planning and Control Techniques for Self-Driving Urban Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.

[6]  Alexandre Alahi,et al.  Crowd-Robot Interaction: Crowd-Aware Robot Navigation With Attention-Based Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[7]  Marco Hutter,et al.  Practical Reinforcement Learning For MPC: Learning from sparse objectives in under an hour on a real robot , 2020, L4DC.

[8]  Dinesh Manocha,et al.  Reciprocal collision avoidance with acceleration-velocity obstacles , 2011, 2011 IEEE International Conference on Robotics and Automation.

[9]  Olivier Stasse,et al.  Using a Memory of Motion to Efficiently Warm-Start a Nonlinear Predictive Controller , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Ross A. Knepper,et al.  Social Momentum: A Framework for Legible Navigation in Dynamic Multi-Agent Environments , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[11]  Jean Oh,et al.  Modeling cooperative navigation in dense human crowds , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Jimmy Ba,et al.  Exploring Model-based Planning with Policy Networks , 2019, ICLR.

[13]  Javier Alonso-Mora,et al.  Model Predictive Contouring Control for Collision Avoidance in Unstructured Dynamic Environments , 2019, IEEE Robotics and Automation Letters.

[14]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[15]  Emanuel Todorov,et al.  Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[16]  Yuval Tassa,et al.  Value function approximation and model predictive control , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[17]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[18]  Mario Zanon,et al.  Safe Reinforcement Learning Using Robust MPC , 2019, IEEE Transactions on Automatic Control.

[19]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[21]  Wolfram Burgard,et al.  Socially Compliant Navigation Through Raw Depth Inputs with Generative Adversarial Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Joelle Pineau,et al.  Socially Adaptive Path Planning in Human Environments Using Inverse Reinforcement Learning , 2016, Int. J. Soc. Robotics.

[23]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[24]  Jonathan P. How,et al.  Motion planning with diffusion maps , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Jan Peters,et al.  Model-based Lookahead Reinforcement Learning , 2019, ArXiv.

[26]  Marco Hutter,et al.  Deep Value Model Predictive Control , 2019, CoRL.

[27]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[28]  Sham M. Kakade,et al.  Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.

[29]  Andreas Krause,et al.  Robot navigation in dense human crowds: Statistical models and experimental studies of human–robot cooperation , 2015, Int. J. Robotics Res..

[30]  Colin Greatwood,et al.  Reinforcement learning and model predictive control for robust embedded quadrotor guidance and control , 2019, Auton. Robots.

[31]  Jia Pan,et al.  Fully Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Safe and Efficient Navigation in Complex Scenarios , 2018, ArXiv.

[32]  Jonathan P. How,et al.  Collision Avoidance in Pedestrian-Rich Environments With Deep Reinforcement Learning , 2021, IEEE Access.

[33]  Ross A. Knepper,et al.  Multi-agent path topology in support of socially competent navigation planning , 2018, Int. J. Robotics Res..

[34]  Jia Pan,et al.  Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios , 2020, Int. J. Robotics Res..

[35]  Guillaume Bellegarda,et al.  Combining Benefits from Trajectory Optimization and Deep Reinforcement Learning , 2019, ArXiv.

[36]  Jonathan P. How,et al.  Socially aware motion planning with deep reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Lukas Hewing,et al.  Learning-Based Model Predictive Control: Toward Safe Learning in Control , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[38]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[39]  Dinesh Manocha,et al.  Reciprocal Velocity Obstacles for real-time multi-agent navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.