Visuomotor Reinforcement Learning for Multirobot Cooperative Navigation

This article investigates the multirobot cooperative navigation problem based on raw visual observations. A fully end-to-end learning framework is presented, which leverages graph neural networks to learn local motion coordination and utilizes deep reinforcement learning to generate visuomotor policy that enables each robot to move to its goal without the need of environment map and global positioning information. Experimental results show that, with a few tens of robots, our approach achieves comparable performance with the state-ofthe-art imitation learning-based approaches with bird-view state inputs. We also illustrate our generalizability to crowded and large environments and our scalability to ten times number of the training robots. In addition, we demonstrate that our model trained for multirobot case can also improve the success rate in the single-robot navigation task in unseen environments. Note to Practitioners—With the development of intelligent industrial and logistic systems, robotic transportation systems are widely implemented. However, existing multirobot path coordination and navigation approaches are basically under some unreasonable assumptions, which are very hard to be implemented in practical scenarios. This article aims to greatly promote the real application of learning-based multirobot cooperative navigation approach, in order to achieve the following. First, we introduce an end-to-end reinforcement learning framework instead of the Manuscript received June 25, 2021; accepted August 24, 2021. This article was recommended for publication by Associate Editor T. Xu and Editor D. O. Popa upon evaluation of the reviewers’ comments. This work was supported in part by the Natural Science Foundation of China under Grant 62073222 and Grant U1913204, in part by Shanghai Municipal Education Commission and Shanghai Education Development Foundation through “Shu Guang” Project under Grant 19SG08, in part by Shenzhen Science and Technology Program under Grant JSGG20201103094400002, and in part by the Science and Technology Commission of Shanghai Municipality under Grant 21511101900. (Zhe Liu and Qiming Liu contributed equally to this work.) (Corresponding author: Hesheng Wang.) Zhe Liu is with the Department of Computer Science and Technology, University of Cambridge, Cambridge CB2 1TN, U.K. (e-mail: zl457@cam.ac.uk). Qiming Liu, Ling Tang, and Hongye Wang are with the Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: qimingliu@sjtu.edu.cn; elftat@sjtu.edu.cn; wanghongye@sjtu.edu.cn). Kefan Jin is with the MOE Key Laboratory of Marine Intelligent Equipment and System and the State Key Laboratory of Ocean Engineering, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: jinkefan@sjtu.edu.cn). Ming Liu is with the Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong (e-mail: eelium@ust.hk). Hesheng Wang is with the Department of Automation, Key Laboratory of System Control and Information Processing of Ministry of Education, Key Laboratory of Marine Intelligent Equipment and System of Ministry of Education, Shanghai Engineering Research Center of Intelligent Control and Management, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: wanghesheng@sjtu.edu.cn). Color versions of one or more figures in this article are available at https://doi.org/10.1109/TASE.2021.3114327. Digital Object Identifier 10.1109/TASE.2021.3114327 commonly used imitation learning strategy, as the latter one needs exhaustive training data to cover all the scenarios and does not have the required generalizability. Second, we directly use the raw sensor data instead of the commonly used birdeye-view semantic observations, as the latter one is generally not representative of practical application scenario from the robot perspective and cannot solve the occlusion issue. Third, we interpret our learned model to illustrate which parts of the input and shared observations contribute most to the robots’ final actions. The above interpretability ensures predictability (thus safety) of our visuomotor policy in practical applications. Our learned visuomotor policy has the ability to coordinate dozens of robots by only using raw visual observations in unknown environments without map nor global localization information, this is the first time in the literature. Our future work includes solving the sim-to-real issue and conducting physical experiments.

[1]  Dario Floreano,et al.  Learning Vision-Based Flight in Drone Swarms by Imitation , 2019, IEEE Robotics and Automation Letters.

[2]  Ashish Kapoor,et al.  Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations , 2019 .

[3]  Changyin Sun,et al.  Reinforcement Learning With Task Decomposition for Cooperative Multiagent Systems , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Hesheng Wang,et al.  Prediction, Planning, and Coordination of Thousand-Warehousing-Robot Networks With Motion and Communication Uncertainties , 2020 .

[5]  Brian M. Sadler,et al.  VGAI: End-to-End Learning of Vision-Based Decentralized Controllers for Robot Swarms , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Hongyi Zhou,et al.  MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Guillaume Sartoretti,et al.  PRIMAL$_2$: Pathfinding Via Reinforcement and Imitation Multi-Agent Learning - Lifelong , 2020, IEEE Robotics and Automation Letters.

[8]  Howie Choset,et al.  PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning , 2018, IEEE Robotics and Automation Letters.

[9]  K. Mani Chandy,et al.  Distributed computation on graphs: shortest path algorithms , 1982, CACM.

[10]  Lorenzo Sabattini,et al.  A Probabilistic Eulerian Traffic Model for the Coordination of Multiple AGVs in Automatic Warehouses , 2016, IEEE Robotics and Automation Letters.

[11]  Zhe Liu,et al.  An Incidental Delivery Based Method for Resolving Multirobot Pairwised Transportation Problems , 2016, IEEE Transactions on Intelligent Transportation Systems.

[12]  Nathan R. Sturtevant,et al.  Conflict-based search for optimal multi-agent pathfinding , 2012, Artif. Intell..

[13]  Yun-Hui Liu,et al.  A Synchronization Approach for Achieving Cooperative Adaptive Cruise Control Based Non-Stop Intersection Passing , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Raia Hadsell,et al.  Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[15]  Jingchuan Wang,et al.  A prioritized planning algorithm of trajectory coordination based on time windows for multiple AGVs with delay disturbance , 2019, Assembly Automation.

[16]  Michal Cáp,et al.  Prioritized Planning Algorithms for Trajectory Coordination of Multiple Mobile Robots , 2014, IEEE Transactions on Automation Science and Engineering.

[17]  Wei Gao,et al.  Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation , 2017, CoRL.

[18]  Michael Milford,et al.  Adversarial discriminative sim-to-real transfer of visuo-motor policies , 2017, Int. J. Robotics Res..

[19]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[20]  Dinesh Manocha,et al.  Reciprocal n-Body Collision Avoidance , 2011, ISRR.

[21]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[22]  Zhe Liu,et al.  Mobile Robot Path Planning in Dynamic Environments Through Globally Guided Reinforcement Learning , 2020, IEEE Robotics and Automation Letters.

[23]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[24]  Yi Shen,et al.  A Self-Repairing Algorithm With Optimal Repair Path for Maintaining Motion Synchronization of Mobile Robot Network , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[25]  Johan Bollen,et al.  Power structure in Chilean news media , 2017, PloS one.

[26]  Pei Xu,et al.  Distributed Non-Communicating Multi-Robot Collision Avoidance via Map-Based Deep Reinforcement Learning , 2020, Sensors.

[27]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Zhe Liu,et al.  Message-Aware Graph Attention Networks for Large-Scale Multi-Robot Path Planning , 2021, IEEE Robotics and Automation Letters.

[29]  Zhe Liu,et al.  Formation Control of Mobile Robots Using Distributed Controller With Sampled-Data and Communication Delays , 2016, IEEE Transactions on Control Systems Technology.