论文信息 - The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games - 字舞流文

The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is signiﬁcantly less utilized than off-policy learning algorithms in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a multi-agent PPO variant which adopts a centralized value function. Using a 1-GPU desktop, we show that MAPPO achieves performance comparable to the state-of-the-art in three popular multi-agent testbeds: the Particle World environments, Starcraft II Micromanagement Tasks, and the Hanabi Challenge, with minimal hyperparam-eter tuning and without any domain-speciﬁc algorithmic modiﬁcations or architectures. In the majority of environments, we ﬁnd that compared to off-policy baselines, MAPPO achieves better or comparable sample complexity as well as substantially faster running time. Finally, we present 5 factors most inﬂuential to MAPPO’s practical performance with ablation studies.

A. Bayen | Akash Velu | Eugene Vinitsky | Chao Yu | Yu Wang | Yi Wu

[1] Adam Lerer,et al. Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings , 2021, ArXiv.

[2] Shimon Whiteson,et al. Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? , 2020, ArXiv.

[3] Trevor Darrell,et al. Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control , 2020, ICLR.

[4] Shimon Whiteson,et al. RODE: Learning Roles to Decompose Multi-Agent Tasks , 2020, ICLR.

[5] Celestine Mendler-Dünner,et al. Revisiting Design Choices in Proximal Policy Optimization , 2020, ArXiv.

[6] Lukas Schäfer,et al. Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms , 2020, ArXiv.

[7] Larry Rudolph,et al. Implementation Matters in Deep RL: A Case Study on PPO and TRPO , 2020, ICLR.

[8] Shimon Whiteson,et al. Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2020, J. Mach. Learn. Res..

[9] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[10] Jakob N. Foerster,et al. Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning , 2019, ICLR.

[11] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[12] Igor Mordatch,et al. Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[13] Pieter Abbeel,et al. rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch , 2019, ArXiv.

[14] Yung Yi,et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[15] Shimon Whiteson,et al. The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[16] H. Francis Song,et al. The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..

[17] A. Madry,et al. A Closer Look at Deep Policy Gradients , 2018, ICLR.

[18] James Bergstra,et al. Benchmarking Reinforcement Learning Algorithms on Real-World Robots , 2018, CoRL.

[19] Wojciech Czarnecki,et al. Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[20] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[21] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[22] Yuanzhi Li,et al. An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.

[23] Sergey Levine,et al. The Mirage of Action-Dependent Baselines in Reinforcement Learning , 2018, ICML.

[24] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[25] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[26] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[27] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[28] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[29] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[30] Pieter Abbeel,et al. Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[31] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[32] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[33] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[34] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[35] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[36] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[37] Matthieu Geist,et al. What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study , 2021, ICLR.