论文信息 - Cooperative Multi-Agent Learning for Navigation via Structured State Abstraction

Cooperative Multi-Agent Learning for Navigation via Structured State Abstraction

Cooperative multi-agent reinforcement learning (MARL) for navigation enables agents to cooperate to achieve their navigation goals. Using emergent communication, agents learn a communication protocol to coordinate and share information that is needed to achieve their navigation tasks. In emergent communication, symbols with no pre-specified usage rules are exchanged, in which the meaning and syntax emerge through training. Learning a navigation policy along with a communication protocol in a MARL environment is highly complex due to the huge state space to be explored. To cope with this complexity, this work proposes a novel neural network architecture, for jointly learning an adaptive state space abstraction and a communication protocol among agents participating in navigation tasks. The goal is to come up with an adaptive abstractor that significantly reduces the size of the state space to be explored, without degradation in the policy performance. Simulation results show that the proposed method reaches a better policy, in terms of achievable rewards, resulting in fewer training iterations compared to the case where raw states or fixed state abstraction are used. Moreover, it is shown that a communication protocol emerges during training which enables the agents to learn better policies within fewer training iterations.

Mohammed S. Elbamby | M. Bennis | S. Samarakoon | Mohamed K. Abdelaziz

[1] M. Mitchell,et al. Abstraction for Deep Reinforcement Learning , 2022, IJCAI.

[2] Minyoung Huh,et al. Learning to Ground Multi-Agent Communication with Autoencoders , 2021, NeurIPS.

[3] Wenguan Wang,et al. Collaborative Visual Navigation , 2021, ArXiv.

[4] Jakob Hoydis,et al. Toward Joint Learning of Optimal MAC Signaling and Wireless Channel Access , 2020, IEEE Transactions on Cognitive Communications and Networking.

[5] Ali Farhadi,et al. A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks , 2020, ECCV.

[6] Angeliki Lazaridou,et al. Emergent Multi-Agent Communication in the Deep Learning Era , 2020, ArXiv.

[7] Lin F. Yang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2019, ICLR.

[8] Lawson L. S. Wong,et al. State Abstraction as Compression in Apprenticeship Learning , 2019, AAAI.

[9] Ali Farhadi,et al. Two Body Problem: Collaborative Visual Task Completion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Joelle Pineau,et al. On the Pitfalls of Measuring Emergent Communication , 2019, AAMAS.

[11] Matthew E. Taylor,et al. A survey and critique of multiagent deep reinforcement learning , 2018, Autonomous Agents and Multi-Agent Systems.

[12] Jure Leskovec,et al. How Powerful are Graph Neural Networks? , 2018, ICLR.

[13] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[14] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.

[15] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[16] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[18] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[19] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[20] Michael I. Jordan,et al. Trust Region Policy Optimization , 2015, ICML.

[21] J. M. M. Montiel,et al. ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[22] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23] Milan Simic,et al. Sampling-Based Robot Motion Planning: A Review , 2014, IEEE Access.

[24] Wolfram Burgard,et al. Improved Techniques for Grid Mapping With Rao-Blackwellized Particle Filters , 2007, IEEE Transactions on Robotics.

[25] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[26] Hanan Samet,et al. The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[27] Diego Pérez-Liébana,et al. Towards Applicable State Abstractions: a Preview in Strategy Games , 2022 .

[28] Tao Zhang,et al. Deep reinforcement learning based mobile robot navigation: A review , 2021 .

[29] L. Buşoniu,et al. Multi-agent Reinforcement Learning: An Overview , 2010 .

[30] Claude E. Shannon,et al. Recent Contributions to The Mathematical Theory of Communication , 2009 .