Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios

Developing a safe and efficient collision-avoidance policy for multiple robots is challenging in the decentralized scenarios where each robot generates its paths with limited observation of other robots’ states and intentions. Prior distributed multi-robot collision-avoidance systems often require frequent inter-robot communication or agent-level features to plan a local collision-free action, which is not robust and computationally prohibitive. In addition, the performance of these methods is not comparable with their centralized counterparts in practice. In this article, we present a decentralized sensor-level collision-avoidance policy for multi-robot systems, which shows promising results in practical applications. In particular, our policy directly maps raw sensor measurements to an agent’s steering commands in terms of the movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to learn an optimal policy. The policy is trained over a large number of robots in rich, complex environments simultaneously using a policy-gradient-based reinforcement-learning algorithm. The learning algorithm is also integrated into a hybrid control framework to further improve the policy’s robustness and effectiveness. We validate the learned sensor-level collision-3avoidance policy in a variety of simulated and real-world scenarios with thorough performance evaluations for large-scale multi-robot systems. The generalization of the learned policy is verified in a set of unseen scenarios including the navigation of a group of heterogeneous robots and a large-scale scenario with 100 robots. Although the policy is trained using simulation data only, we have successfully deployed it on physical robots with shapes and dynamics characteristics that are different from the simulated agents, in order to demonstrate the controller’s robustness against the simulation-to-real modeling error. Finally, we show that the collision-avoidance policy learned from multi-robot navigation tasks provides an excellent solution for safe and effective autonomous navigation for a single robot working in a dense real human crowd. Our learned policy enables a robot to make effective progress in a crowd without getting stuck. More importantly, the policy has been successfully deployed on different types of physical robot platforms without tedious parameter tuning. Videos are available at https://sites.google.com/view/hybridmrca.

[1]  Saptarshi Bandyopadhyay,et al.  Fast, On-line Collision Avoidance for Dynamic Vehicles Using Buffered Voronoi Cells , 2017, IEEE Robotics and Automation Letters.

[2]  Ingmar Posner,et al.  Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks , 2016, AAAI.

[3]  Vijay Kumar,et al.  Hold Or take Optimal Plan (HOOP): A quadratic programming approach to multi-robot trajectory generation , 2018, Int. J. Robotics Res..

[4]  Luc Van Gool,et al.  Object Detection and Tracking for Autonomous Navigation in Dynamic Environments , 2010, Int. J. Robotics Res..

[5]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Nathan R. Sturtevant,et al.  Conflict-based search for optimal multi-agent pathfinding , 2012, Artif. Intell..

[8]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Karl Tuyls,et al.  Collision avoidance under bounded localization uncertainty , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[11]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[12]  Jur P. van den Berg,et al.  Generalized reciprocal collision avoidance , 2015, Int. J. Robotics Res..

[13]  Javier Alonso-Mora,et al.  Multi-robot formation control and object transport in dynamic environments via constrained optimization , 2017, Int. J. Robotics Res..

[14]  Michael Milford,et al.  Multimodal deep autoencoders for control of a mobile robot , 2015, ICRA 2015.

[15]  Ross A. Knepper,et al.  DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[16]  Tucker R. Balch,et al.  Behavior-based formation control for multirobot teams , 1998, IEEE Trans. Robotics Autom..

[17]  F. Borrelli,et al.  A study on decentralized receding horizon control for decoupled systems , 2004, Proceedings of the 2004 American Control Conference.

[18]  Sergey Levine,et al.  Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.

[19]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Sergey Levine,et al.  Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Vijay Kumar,et al.  Capt: Concurrent assignment and planning of trajectories for multiple robots , 2014, Int. J. Robotics Res..

[22]  Roland Siegwart,et al.  From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[24]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[25]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ricardo O. Carelli,et al.  Dynamic model based formation control and obstacle avoidance of multi-robot systems , 2008, Robotica.

[27]  Hao Zhang,et al.  Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Roland Siegwart,et al.  Cooperative Collision Avoidance for Nonholonomic Robots , 2018, IEEE Transactions on Robotics.

[29]  T. Murphey,et al.  Switching Rules for Decentralized Control with Simple Control Laws , 2007, 2007 American Control Conference.

[30]  Jian Chen,et al.  Leader-Follower Formation Control of Multiple Non-holonomic Mobile Robots Incorporating a Receding-horizon Scheme , 2010, Int. J. Robotics Res..

[31]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Jia Pan,et al.  Deep-Learned Collision Avoidance Policy for Distributed Multiagent Navigation , 2016, IEEE Robotics and Automation Letters.

[33]  Steven M. LaValle,et al.  Optimal Multirobot Path Planning on Graphs: Complete Algorithms and Effective Heuristics , 2015, IEEE Transactions on Robotics.

[34]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[35]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[36]  Manuela M. Veloso,et al.  Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork , 1999, Artif. Intell..

[37]  Dinesh Manocha,et al.  BRVO: Predicting pedestrian trajectories using velocity-space reasoning , 2015, Int. J. Robotics Res..

[38]  Maria L. Gini,et al.  Implicit Coordination in Crowded Multi-Agent Navigation , 2016, AAAI.

[39]  Jian Chen,et al.  Resource constrained multirobot task allocation based on leader–follower coalition methodology , 2011, Int. J. Robotics Res..

[40]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[41]  Vijay Kumar,et al.  Cooperative manipulation and transportation with aerial robots , 2009, Auton. Robots.

[42]  Jonathan P. How,et al.  Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[43]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[44]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[45]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[46]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[47]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[48]  Wolfram Burgard,et al.  Deep reinforcement learning with successor features for navigation across similar environments , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[49]  Claire J. Tomlin,et al.  Applications of hybrid reachability analysis to robotic aerial vehicles , 2011, Int. J. Robotics Res..

[50]  Oussama Khatib,et al.  A depth space approach to human-robot collision avoidance , 2012, 2012 IEEE International Conference on Robotics and Automation.

[51]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Wolfram Burgard,et al.  The dynamic window approach to collision avoidance , 1997, IEEE Robotics Autom. Mag..

[53]  Karl Tuyls,et al.  Multi-robot collision avoidance with localization uncertainty , 2012, AAMAS.

[54]  Paul A. Beardsley,et al.  Optimal Reciprocal Collision Avoidance for Multiple Non-Holonomic Robots , 2010, DARS.

[55]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[56]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[57]  Vijay Kumar,et al.  Formal Modeling and Analysis of Hybrid Systems: A Case Study in Multi-robot Coordination , 1999, World Congress on Formal Methods.

[58]  Xiaogang Wang,et al.  Pedestrian Behavior Understanding and Prediction with Deep Neural Networks , 2016, ECCV.

[59]  Dinesh Manocha,et al.  The Hybrid Reciprocal Velocity Obstacle , 2011, IEEE Transactions on Robotics.

[60]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[61]  Gang Feng,et al.  A Synchronization Approach to Trajectory Tracking of Multiple Mobile Robots While Maintaining Time-Varying Formations , 2009, IEEE Transactions on Robotics.

[62]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[63]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Maria L. Gini,et al.  Moving in a Crowd: Safe and Efficient Navigation among Heterogeneous Agents , 2016, IJCAI.

[65]  Thomas Bak,et al.  Planning : A Timed Automata Approach , 2004 .

[66]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[67]  Dinesh Manocha,et al.  Reciprocal n-Body Collision Avoidance , 2011, ISRR.

[68]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[69]  Xiaoming Hu,et al.  A hybrid control approach to action coordination for mobile robots , 1999, Autom..

[70]  Dinesh Manocha,et al.  Smooth and collision-free navigation for multiple robots under differential-drive constraints , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[71]  Simon Lacroix,et al.  Reactive navigation in outdoor environments using potential fields , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[72]  Lounis Adouane Hybrid and Safe Control Architecture for Mobile Robot Navigation , 2009 .

[73]  Dong Sun,et al.  Automatic transportation of biological cells with a robot-tweezer manipulation system , 2011, Int. J. Robotics Res..

[74]  Jonathan P. How,et al.  Socially aware motion planning with deep reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[75]  Kostas E. Bekris,et al.  Efficient and complete centralized multi-robot path planning , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[76]  Jonathan P. How,et al.  Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[77]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[78]  J. Schwartz,et al.  On the Piano Movers' Problem: III. Coordinating the Motion of Several Independent Bodies: The Special Case of Circular Bodies Moving Amidst Polygonal Barriers , 1983 .

[79]  Dinesh Manocha,et al.  Reciprocal Velocity Obstacles for real-time multi-agent navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[80]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.