论文信息 - QTRAN++: Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning - 字舞流文

QTRAN++: Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning

QTRAN is a multi-agent reinforcement learning (MARL) algorithm capable of learning the largest class of joint-action value functions up to date. However, despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments, such as Starcraft Multi-Agent Challenge (SMAC). In this paper, we identify the performance bottleneck of QTRAN and propose a substantially improved version, coined QTRAN++. Our gains come from (i) stabilizing the training objective of QTRAN, (ii) removing the strict role separation between the action-value estimators of QTRAN, and (iii) introducing a multi-head mixing network for value transformation. Through extensive evaluation, we confirm that our diagnosis is correct, and QTRAN++ successfully bridges the gap between empirical performance and theoretical guarantee. In particular, QTRAN++ newly achieves state-of-the-art performance in the SMAC environment. The code will be released.

Jinwoo Shin | Yung Yi | Kyunghwan Son | Sungsoo Ahn | Roben Delos Reyes | Jinwoo Shin | Yung Yi | Sungsoo Ahn | Kyunghwan Son | R. D. Reyes

[1] Shimon Whiteson,et al. MAVEN: Multi-Agent Variational Exploration , 2019, NeurIPS.

[2] Fei Sha,et al. Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.

[3] Shimon Whiteson,et al. Weighted QMIX: Expanding Monotonic Value Function Factorisation , 2020, ArXiv.

[4] S. G. Ponnambalam,et al. Reinforcement learning in swarm-robotics for multi-agent foraging-task domain , 2013, 2013 IEEE Symposium on Swarm Intelligence (SIS).

[5] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[6] Yung Yi,et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[7] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[8] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[10] Jianye Hao,et al. Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning , 2020, ArXiv.

[11] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[12] Dorian Kodelja,et al. Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[13] Jun Wang,et al. Multi-Agent Determinantal Q-Learning , 2020, ICML.

[14] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[15] Chongjie Zhang,et al. ROMA: Multi-Agent Reinforcement Learning with Emergent Roles , 2020, ICML.

[16] Shimon Whiteson,et al. The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[17] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[18] Amnon Shashua,et al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[19] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[20] Yang Yu,et al. QPLEX: Duplex Dueling Multi-Agent Q-Learning , 2020, ArXiv.

[21] Lei Han,et al. LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning , 2019, NeurIPS.

[22] Ming Tan,et al. Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.