Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning

Developing a safe and efficient collision avoidance policy for multiple robots is challenging in the decentralized scenarios where each robot generates its paths without observing other robots' states and intents. While other distributed multi-robot collision avoidance systems exist, they often require extracting agent-level features to plan a local collision-free action, which can be computationally prohibitive and not robust. More importantly, in practice the performance of these methods are much lower than their centralized counterparts. We present a decentralized sensor-level collision avoidance policy for multi-robot systems, which directly maps raw sensor measurements to an agent's steering commands in terms of movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to learn an optimal policy. The policy is trained over a large number of robots on rich, complex environments simultaneously using a policy gradient based reinforcement learning algorithm. We validate the learned sensor-level collision avoidance policy in a variety of simulated scenarios with thorough performance evaluations and show that the final learned policy is able to find time efficient, collision-free paths for a large-scale robot system. We also demonstrate that the learned policy can be well generalized to new scenarios that do not appear in the entire training period, including navigating a heterogeneous group of robots and a large-scale scenario with 100 robots. Videos are available at https://sites.google.com/view/drlmaca.

[1]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Dinesh Manocha,et al.  The Hybrid Reciprocal Velocity Obstacle , 2011, IEEE Transactions on Robotics.

[3]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[4]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[5]  Maria L. Gini,et al.  Implicit Coordination in Crowded Multi-Agent Navigation , 2016, AAAI.

[6]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[7]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[8]  Roland Siegwart,et al.  From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[10]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[11]  Jonathan P. How,et al.  Socially aware motion planning with deep reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Jia Pan,et al.  Deep-Learned Collision Avoidance Policy for Distributed Multiagent Navigation , 2016, IEEE Robotics and Automation Letters.

[13]  Dinesh Manocha,et al.  Smooth and collision-free navigation for multiple robots under differential-drive constraints , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Sergey Levine,et al.  Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.

[15]  Wolfram Burgard,et al.  Deep reinforcement learning with successor features for navigation across similar environments , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Karl Tuyls,et al.  Multi-robot collision avoidance with localization uncertainty , 2012, AAMAS.

[17]  Paul A. Beardsley,et al.  Optimal Reciprocal Collision Avoidance for Multiple Non-Holonomic Robots , 2010, DARS.

[18]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[20]  Jur P. van den Berg,et al.  Generalized reciprocal collision avoidance , 2015, Int. J. Robotics Res..

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Karl Tuyls,et al.  Collision avoidance under bounded localization uncertainty , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Michael Milford,et al.  Multimodal deep autoencoders for control of a mobile robot , 2015, ICRA 2015.

[24]  Jonathan P. How,et al.  Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[26]  Dinesh Manocha,et al.  Reciprocal Velocity Obstacles for real-time multi-agent navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.