Where to go Next: Learning a Subgoal Recommendation Policy for Navigation in Dynamic Environments

Robotic navigation in environments shared with other robots or humans remains challenging because the intentions of the surrounding agents are not directly observable and the environment conditions are continuously changing. Local trajectory optimization methods, such as model predictive control (MPC), can deal with those changes but require global guidance, which is not trivial to obtain in crowded scenarios. This letter proposes to learn, via deep Reinforcement Learning (RL), an interaction-aware policy that provides long-term guidance to the local planner. In particular, in simulations with cooperative and non-cooperative agents, we train a deep network to recommend a subgoal for the MPC planner. The recommended subgoal is expected to help the robot in making progress towards its goal and accounts for the expected interaction with other agents. Based on the recommended subgoal, the MPC planner then optimizes the inputs for the robot satisfying its kinodynamic and collision avoidance constraints. Our approach is shown to substantially improve the navigation performance in terms of number of collisions as compared to prior MPC frameworks, and in terms of both travel time and number of collisions compared to deep RL methods in cooperative, competitive and mixed multiagent scenarios.

[1]  Dinesh Manocha,et al.  Reciprocal collision avoidance with acceleration-velocity obstacles , 2011, 2011 IEEE International Conference on Robotics and Automation.

[2]  Olivier Stasse,et al.  Using a Memory of Motion to Efficiently Warm-Start a Nonlinear Predictive Controller , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Jia Pan,et al.  Fully Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Safe and Efficient Navigation in Complex Scenarios , 2018, ArXiv.

[5]  Ross A. Knepper,et al.  Social Momentum: A Framework for Legible Navigation in Dynamic Multi-Agent Environments , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[6]  Dinesh Manocha,et al.  Reciprocal Velocity Obstacles for real-time multi-agent navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[7]  Alberto Bemporad,et al.  Practical Reinforcement Learning of Stabilizing Economic MPC , 2019, 2019 18th European Control Conference (ECC).

[8]  Louis Wehenkel,et al.  Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Dinesh Manocha,et al.  Reciprocal n-Body Collision Avoidance , 2011, ISRR.

[10]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[11]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[12]  Sham M. Kakade,et al.  Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.

[13]  Emanuel Todorov,et al.  Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[14]  Jonathan P. How,et al.  Collision Avoidance in Pedestrian-Rich Environments With Deep Reinforcement Learning , 2021, IEEE Access.

[15]  Marco Hutter,et al.  Deep Value Model Predictive Control , 2019, CoRL.

[16]  Jonathan P. How,et al.  Motion planning with diffusion maps , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Katie Byl,et al.  An Online Training Method for Augmenting MPC with Deep Reinforcement Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Jan Peters,et al.  Model-based Lookahead Reinforcement Learning , 2019, ArXiv.

[19]  Yuval Tassa,et al.  Value function approximation and model predictive control , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[20]  Jia Pan,et al.  Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios , 2020, Int. J. Robotics Res..

[21]  Ross A. Knepper,et al.  Multi-agent path topology in support of socially competent navigation planning , 2018, Int. J. Robotics Res..

[22]  Jimmy Ba,et al.  Exploring Model-based Planning with Policy Networks , 2019, ICLR.

[23]  Wolfram Burgard,et al.  Socially Compliant Navigation Through Raw Depth Inputs with Generative Adversarial Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Emilio Frazzoli,et al.  A Survey of Motion Planning and Control Techniques for Self-Driving Urban Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.

[25]  Guillaume Bellegarda,et al.  Combining Benefits from Trajectory Optimization and Deep Reinforcement Learning , 2019, ArXiv.

[26]  Marco Hutter,et al.  Practical Reinforcement Learning For MPC: Learning from sparse objectives in under an hour on a real robot , 2020, L4DC.

[27]  Andreas Krause,et al.  Robot navigation in dense human crowds: Statistical models and experimental studies of human–robot cooperation , 2015, Int. J. Robotics Res..

[28]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[29]  Jonathan P. How,et al.  Socially aware motion planning with deep reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Lukas Hewing,et al.  Learning-Based Model Predictive Control: Toward Safe Learning in Control , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[31]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[32]  Javier Alonso-Mora,et al.  Model Predictive Contouring Control for Collision Avoidance in Unstructured Dynamic Environments , 2019, IEEE Robotics and Automation Letters.

[33]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[34]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[35]  Colin Greatwood,et al.  Reinforcement learning and model predictive control for robust embedded quadrotor guidance and control , 2019, Auton. Robots.

[36]  Alexandre Alahi,et al.  Crowd-Robot Interaction: Crowd-Aware Robot Navigation With Attention-Based Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[37]  Jean Oh,et al.  Modeling cooperative navigation in dense human crowds , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Mario Zanon,et al.  Safe Reinforcement Learning Using Robust MPC , 2019, IEEE Transactions on Automatic Control.

[39]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[40]  Joelle Pineau,et al.  Socially Adaptive Path Planning in Human Environments Using Inverse Reinforcement Learning , 2016, Int. J. Soc. Robotics.

[41]  Sergey Levine,et al.  Variational Policy Search via Trajectory Optimization , 2013, NIPS.