论文信息 - Multi-Agent Collaboration via Reward Attribution Decomposition - 字舞流文

Multi-Agent Collaboration via Reward Attribution Decomposition

Recent advances in multi-agent reinforcement learning (MARL) have achieved super-human performance in games like Quake 3 and Dota 2. Unfortunately, these techniques require orders-of-magnitude more training rounds than humans and don't generalize to new agent configurations even on the same game. In this work, we propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge and supports ad hoc team play. We first formulate multi-agent collaboration as a joint optimization on reward assignment and show that each agent has an approximately optimal policy that decomposes into two parts: one part that only relies on the agent's own state, and the other part that is related to states of nearby agents. Following this novel finding, CollaQ decomposes the Q-function of each agent into a self term and an interactive term, with a Multi-Agent Reward Attribution (MARA) loss that regularizes the training. CollaQ is evaluated on various StarCraft maps and shows that it outperforms existing state-of-the-art techniques (i.e., QMIX, QTRAN, and VDN) by improving the win rate by 40% with the same number of samples. In the more challenging ad hoc team play setting (i.e., reweight/add/remove units without re-training or finetuning), CollaQ outperforms previous SoTA by over 30%.

Yuandong Tian | Kurt Keutzer | Joseph E. Gonzalez | Yi Wu | Huazhe Xu | Xiaolong Wang | Tianjun Zhang | K. Keutzer | Joseph Gonzalez | Yuandong Tian | Huazhe Xu | Yi Wu | Xiaolong Wang | Tianjun Zhang

[1] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[2] Alexandre M. Bayen,et al. Benchmarks for reinforcement learning in mixed-autonomy traffic , 2018, CoRL.

[3] Michael H. Bowling,et al. Coordination and Adaptation in Impromptu Teams , 2005, AAAI.

[4] Karol Hausman,et al. Learning to Interactively Learn and Assist , 2019, AAAI.

[5] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[6] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[7] Adam Lerer,et al. "Other-Play" for Zero-Shot Coordination , 2020, ICML.

[8] Yuandong Tian,et al. M^3RL: Mind-aware Multi-agent Management Reinforcement Learning , 2018, ICLR.

[9] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[10] Jonathan P. How,et al. Robust Opponent Modeling via Adversarial Ensemble Reinforcement Learning in Asymmetric Imperfect-Information Games , 2019, ArXiv.

[11] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[12] Peter Stone,et al. Cooperating with Unknown Teammates in Complex Domains: A Robot Soccer Case Study of Ad Hoc Teamwork , 2015, AAAI.

[13] Jordan L. Boyd-Graber,et al. Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[14] Peter Stone,et al. Cooperating with a markovian ad hoc teammate , 2013, AAMAS.

[15] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[16] Pieter Abbeel,et al. Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[17] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18] Zihan Zhou,et al. Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning , 2020, ICLR.

[19] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[20] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[21] Shimon Whiteson,et al. The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[22] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[23] Joshua B. Tenenbaum,et al. Finding Friend and Foe in Multi-Agent Games , 2019, NeurIPS.

[24] Sarit Kraus,et al. Learning Teammate Models for Ad Hoc Teamwork , 2012, AAMAS 2012.

[25] Igor Mordatch,et al. Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[26] Matthew Hausknecht and Peter Stone,et al. Half Field Offense: An Environment for Multiagent Learning and Ad Hoc Teamwork , 2016 .

[27] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[28] Hoong Chuin Lau,et al. Credit Assignment For Collective Multiagent RL With Global Rewards , 2018, NeurIPS.

[29] Noam Brown,et al. Superhuman AI for multiplayer poker , 2019, Science.

[30] Yi Wu,et al. Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient , 2019, AAAI.

[31] Julian Togelius,et al. Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi , 2020, ArXiv.

[32] Alessandro Lazaric,et al. A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning , 2019, NeurIPS.

[33] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[34] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[35] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[36] David Isele,et al. CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning , 2018, ICLR.

[37] Yung Yi,et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[38] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[39] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[40] Sarit Kraus,et al. Empirical evaluation of ad hoc teamwork in the pursuit domain , 2011, AAMAS.

[41] Yifeng Zhu,et al. Zero Shot Transfer Learning for Robot Soccer , 2018, AAMAS.

[42] Abhinav Gupta,et al. Efficient Bimanual Manipulation Using Learned Task Schemas , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[43] Sarit Kraus,et al. Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[44] Fei Sha,et al. Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.