论文信息 - Value-Decomposition Multi-Agent Actor-Critics - 字舞流文

Value-Decomposition Multi-Agent Actor-Critics

The exploitation of extra state information has been an active research area in multi-agent reinforcement learning (MARL). QMIX represents the joint action-value using a non-negative function approximator and achieves the best performance, by far, on multi-agent benchmarks, StarCraft II micromanagement tasks. However, our experiments show that, in some cases, QMIX is incompatible with A2C, a training paradigm that promotes algorithm training efficiency. To obtain a reasonable trade-off between training efficiency and algorithm performance, we extend value-decomposition to actor-critics that are compatible with A2C and propose a novel actor-critic framework, value-decomposition actor-critics (VDACs). We evaluate VDACs on the testbed of StarCraft II micromanagement tasks and demonstrate that the proposed framework improves median performance over other actor-critic methods. Furthermore, we use a set of ablation experiments to identify the key factors that contribute to the performance of VDACs.

Stephen C. Adams | Peter A. Beling | Stephen Adams | Jianyu Su | P. Beling | Jianyu Su

[1] Hoong Chuin Lau,et al. Credit Assignment For Collective Multiagent RL With Global Rewards , 2018, NeurIPS.

[2] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[3] Jun Wang,et al. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[4] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[5] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[6] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[7] Mykel J. Kochenderfer,et al. Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[8] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[9] Masayoshi Tomizuka,et al. Interaction-aware Decision Making with Adaptive Strategies under Merging Scenarios , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[11] Danna Zhou,et al. d. , 1934, Microbial pathogenesis.

[12] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13] Shimon Whiteson,et al. The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[14] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[15] J. Koenderink. Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[16] Nicolas Usunier,et al. Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[17] Moshe Dor,et al. אבן, and: Stone , 2017 .

[18] Benjamin Y. Choo,et al. Health-aware hierarchical control for smart manufacturing using reinforcement learning , 2017, 2017 IEEE International Conference on Prognostics and Health Management (ICPHM).

[19] Zongqing Lu,et al. Learning Attentional Communication for Multi-Agent Cooperation , 2018, NeurIPS.

[20] Joelle Pineau,et al. TarMAC: Targeted Multi-Agent Communication , 2018, ICML.

[21] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[22] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[23] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[24] Peter A. Beling,et al. Counterfactual Multi-Agent Reinforcement Learning with Graph Convolution Communication , 2020, ArXiv.

[25] Yoshua Bengio,et al. Incorporating Functional Knowledge in Neural Networks , 2009, J. Mach. Learn. Res..

[26] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[27] Yung Yi,et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[28] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[29] Drew Wicke,et al. Multiagent Soft Q-Learning , 2018, AAAI Spring Symposia.

[30] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[31] Dorian Kodelja,et al. Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[32] Fei Sha,et al. Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.

[33] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.

[34] Kagan Tumer,et al. Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[35] Peng Peng,et al. Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[36] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[37] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[38] Jun Wang,et al. Multi-Agent Reinforcement Learning , 2020, Deep Reinforcement Learning.