Improving Fast Adaptation for Newcomers in Multi-Robot Reinforcement Learning System

Multi-robot system has been adopted as a kind of ubiquitous intelligent systems to perform critical tasks in various fields. In multi-robot systems, multi-agent reinforcement learning (MARL) is regarded as a promising technology to support decision-making. However, existing MARL approaches assume either a predefined system configuration or a unified model for agents with identical roles, and thus cannot effectively deal with the dynamic change in the number of robots, which is very common in the real world. This kind of "adaptation" problem seriously hinders the development of intelligence in multi-robot systems. In this paper, we propose a novel meta-MADDPG approach to enable new robots to integrate into an existing multi-robot system quickly. We build on the MADDPG (Multi-Agent Deep Deterministic Policy Gradient) algorithm and distill the meta-knowledge of a specific robot team by training a meta-actor and a meta-critic simultaneously. The meta-actor can learn an experienced policy net for new robots to perform reasonable actions directly if the situation is urgent, while the meta-critic trains a value net to criticize the current situation for better evolution of new robots. Our experiments on a typical application case (multi-robot collision avoidance) indicate that the meta-knowledge can significantly improve the fast adaptation for the newcomers. Our source code is available at https://github.com/liyiying/meta-MADDPG.

[1]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[2]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[3]  Steven Lake Waslander,et al.  Optimal Path Planning in Cooperative Heterogeneous Multi-robot Delivery Systems , 2014, WAFR.

[4]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[5]  D. Meyer,et al.  Supporting Online Material Materials and Methods Som Text Figs. S1 to S6 References Evidence for a Collective Intelligence Factor in the Performance of Human Groups , 2022 .

[6]  Bogdan Gabrys,et al.  Metalearning: a survey of trends and technologies , 2013, Artificial Intelligence Review.

[7]  Hao Zhang,et al.  Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[9]  Vijay Kumar,et al.  Scalable Centralized Deep Multi-Agent Reinforcement Learning via Policy Gradients , 2018, ArXiv.

[10]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[11]  Pedro U. Lima,et al.  Multi-Robot Systems , 2005, Innovations in Robot Mobility and Control.

[12]  Songhwai Oh,et al.  Real-time navigation in crowded dynamic environments using Gaussian process motion control , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[14]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[15]  Pieter Abbeel,et al.  Image Object Label 3 D CAD Model Candidate Grasps Google Object Recognition Engine Google Cloud Storage Select Feasible Grasp with Highest Success Probability Pose EstimationCamera Robots Cloud 3 D Sensor , 2014 .

[16]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[17]  Karl Tuyls,et al.  Multi-robot collision avoidance with localization uncertainty , 2012, AAMAS.

[18]  Zongqing Lu,et al.  Learning Attentional Communication for Multi-Agent Cooperation , 2018, NeurIPS.

[19]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[20]  Yedid Hoshen,et al.  VAIN: Attentional Multi-agent Predictive Modeling , 2017, NIPS.

[21]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[22]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[23]  Jonathan P. How,et al.  Socially aware motion planning with deep reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[25]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Jonathan M. Garibaldi,et al.  Multi-Robot Search and Rescue: A Potential Field Based Approach , 2007 .

[27]  Li Zhang,et al.  Learning to Learn: Meta-Critic Networks for Sample Efficient Learning , 2017, ArXiv.

[28]  Huaimin Wang,et al.  Toward QoS-Aware Cloud Robotic Applications: A Hybrid Architecture and Its Implementation , 2016, 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld).

[29]  Gaurav S. Sukhatme,et al.  Multiple Mobile Robot Systems , 2016, Springer Handbook of Robotics, 2nd Ed..

[30]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.