论文信息 - The Partially Observable Asynchronous Multi-Agent Cooperation Challenge - 字舞流文

The Partially Observable Asynchronous Multi-Agent Cooperation Challenge

Multi-agent reinforcement learning (MARL) has received increasing attention for its applications in various domains. Researchers have paid much attention on its partially observable and cooperative settings for meeting real-world requirements. For testing performance of different algorithms, standardized environments are designed such as the StarCraft Multi-Agent Challenge, which is one of the most successful MARL benchmarks. To our best knowledge, most of current environments are synchronous, where agents execute actions in the same pace. However, heterogeneous agents usually have their own action spaces and there is no guarantee for actions from different agents to have the same executed cycle, which leads to asynchronous multi-agent cooperation. Inspired from the Wargame, a confrontation game between two armies abstracted from real world environment, we propose the first Partially Observable Asynchronous multi-agent Cooperation challenge (POAC) for the MARL community. Specifically, POAC supports two teams of heterogeneous agents to fight with each other, where an agent selects actions based on its own observations and cooperates asynchronously with its allies. Moreover, POAC is a light weight, flexible and easy to use environment, which can be configured by users to meet different experimental requirements such as self-play model, human-AI model and so on. Along with our benchmark, we offer six game scenarios of varying difficulties with the built-in rule-based AI as opponents. Finally, since most MARL algorithms are designed for synchronous agents, we revise several representatives to meet the asynchronous setting, and the relatively poor experimental results validate the challenge of POAC. Source code is released in http://turingai.ia.ac.cn/data center/show.

Kaiqi Huang | Bin Liang | Qiyue Yin | Meng Yao | Jun Yang | Tongtong Yu | Shengqi Shen | Junge Zhang | Kaiqi Huang | Jun Yang | Junge Zhang | Qiyue Yin | Bin Liang | Meng Yao | Tongtong Yu | Shengqi Shen

[1] Yujing Hu,et al. Fever Basketball: A Complex, Flexible, and Asynchronized Sports Game Environment for Multi-agent Reinforcement Learning , 2020, ArXiv.

[2] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[3] Shimon Whiteson,et al. The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[4] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[5] Ming Zhou,et al. Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[6] Joseph Y. Halpern. Computer Science and Game Theory: A Brief Survey , 2007, ArXiv.

[7] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[8] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[9] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[10] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[11] Olivier Bachem,et al. Google Research Football: A Novel Reinforcement Learning Environment , 2020, AAAI.

[12] Yung Yi,et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[13] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[14] Julian Togelius,et al. Pommerman: A Multi-Agent Playground , 2018, AIIDE Workshops.

[15] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[16] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[17] Ying-Chang Liang,et al. Applications of Deep Reinforcement Learning in Communications and Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[18] Laurent Jeanpierre,et al. Coordinated Multi-Robot Exploration Under Communication Constraints Using Decentralized Markov Decision Processes , 2012, AAAI.

[19] Mayank Bansal,et al. ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[20] Dorian Kodelja,et al. Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[21] M. Marchesi,et al. Scaling and criticality in a stochastic multi-agent model of a financial market , 1999, Nature.