ALMA: Hierarchical Learning for Composite Multi-Agent Tasks

Despite significant progress on multi-agent reinforcement learning (MARL) in recent years, coordination in complex domains remains a challenge. Work in MARL often focuses on solving tasks where agents interact with all other agents and entities in the environment; however, we observe that real-world tasks are often composed of several isolated instances of local agent interactions (subtasks), and each agent can meaningfully focus on one subtask to the exclusion of all else in the environment. In these composite tasks , successful policies can often be decomposed into two levels of decision-making: agents are allocated to specific subtasks and each agent acts productively towards their assigned subtask alone. This decomposed decision making provides a strong structural inductive bias, significantly reduces agent observation spaces, and encourages subtask-specific policies to be reused and composed during training, as opposed to treating each new composition of subtasks as unique. We introduce ALMA, a general learning method for taking advantage of these structured tasks. ALMA simultaneously learns a high-level subtask allocation policy and low-level agent policies. We demonstrate that ALMA learns sophisticated coordination behavior in a number of challenging environments, outperforming strong baselines. ALMA’s modularity also enables it to better generalize to new environment configurations. Finally, we find that while ALMA can integrate separately trained allocation and action policies, the best performance is obtained only by training all components jointly.

[1]  Yuval Tassa,et al.  From Motor Control to Team Play in Simulated Humanoid Football , 2021, Sci. Robotics.

[2]  P. Stone,et al.  Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition , 2021, ICML.

[3]  Chongjie Zhang,et al.  QPLEX: Duplex Dueling Multi-Agent Q-Learning , 2020, ICLR.

[4]  Shimon Whiteson,et al.  Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning , 2020, ICML.

[5]  Yoshua Bengio,et al.  Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon , 2018, Eur. J. Oper. Res..

[6]  Andriy Mnih,et al.  Q-Learning in enormous action spaces via amortized approximate maximization , 2020, ArXiv.

[7]  H. Zha,et al.  Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery , 2019, AAMAS.

[8]  David Isele,et al.  CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning , 2018, ICLR.

[9]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[10]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[11]  Alessandro Lazaric,et al.  A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning , 2019, NeurIPS.

[12]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[13]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[14]  Peter Dayan,et al.  Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning , 2019, ICLR 2019.

[15]  Shimon Whiteson,et al.  Multi-Agent Common Knowledge Reinforcement Learning , 2018, NeurIPS.

[16]  Yuandong Tian,et al.  M^3RL: Mind-aware Multi-agent Management Reinforcement Learning , 2018, ICLR.

[17]  Jianye Hao,et al.  Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction , 2018 .

[18]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[19]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[20]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[23]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[24]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[25]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[26]  David Silver,et al.  Learning values across many orders of magnitude , 2016, NIPS.

[27]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[28]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[29]  Ahmed M. Elmogy,et al.  Multi-robot Task Allocation: A Review of the State-of-the-Art , 2015, Advances in Social Media Analysis.

[30]  Luke B. Johnson,et al.  Multiagent allocation of Markov decision process tasks , 2013, 2013 American Control Conference.

[31]  Mohammad Reza Nami,et al.  Multi-Agent Systems: A Survey , 2010, PDPTA.

[32]  Prasad Tadepalli,et al.  Solving multiagent assignment Markov decision processes , 2009, AAMAS.

[33]  Shimon Whiteson,et al.  Exploiting locality of interaction in factored Dec-POMDPs , 2008, AAMAS.

[34]  Brian P. Gerkey,et al.  A Formal Analysis and Taxonomy of Task Allocation in Multi-Robot Systems , 2004, Int. J. Robotics Res..

[35]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[36]  Sarit Kraus,et al.  Methods for Task Allocation via Agent Coalition Formation , 1998, Artif. Intell..

[37]  Reid G. Smith,et al.  The Contract Net Protocol: High-Level Communication and Control in a Distributed Problem Solver , 1980, IEEE Transactions on Computers.