Independent Multi-agent Reinforcement Learning Using Common Knowledge

Many recent multi-agent reinforcement learning algorithms used centralized training with decentralized execution (CTDE), which results in a training process that relies on global information and suffers from the dimensional explosion. The independent learning (IL) approaches are simple in structure and can be more easily deployed to a wider range of multi-agent scenarios, but they can only solve relatively simple problems due to environment non-stationarity and partially observable. With this motivation, we let IL agents compute common knowledge information and fuse it with observation to explicitly exploit common knowledge. In addition, we chose a suitable network structure according to the characteristics of IL, using convolutional layers and GRU layers. Based on the above two improvements, we implement two IL algorithms. In our experiments, the algorithms we implemented show significant performance improvements compared to original IL algorithms and further approach CTDE while outperforming multi-agent common knowledge reinforcement learning.

[1]  Shimon Whiteson,et al.  Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? , 2020, ArXiv.

[2]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[3]  Shimon Whiteson,et al.  Multi-Agent Common Knowledge Reinforcement Learning , 2018, NeurIPS.

[4]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[5]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[6]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[8]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[9]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[10]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[11]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[12]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[13]  Michael I. Jordan,et al.  Trust Region Policy Optimization , 2015, ICML.

[14]  Ashutosh Nayyar,et al.  Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach , 2012, IEEE Transactions on Automatic Control.

[15]  Shobha Venkataraman,et al.  Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[16]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[17]  Joseph Y. Halpern,et al.  Knowledge and common knowledge in a distributed environment , 1984, JACM.

[18]  K. Tuyls,et al.  Lenient Frequency Adjusted Q-learning , 2010 .

[19]  K. Vrieze,et al.  A course in game theory , 1992 .