A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning

Effective coordination is crucial to solve multi-agent collaborative (MAC) problems. While centralized reinforcement learning methods can optimally solve small MAC instances, they do not scale to large problems and they fail to generalize to scenarios different from those seen during training. In this paper, we consider MAC problems with some intrinsic notion of locality (e.g., geographic proximity) such that interactions between agents and tasks are locally limited. By leveraging this property, we introduce a novel structured prediction approach to assign agents to tasks. At each step, the assignment is obtained by solving a centralized optimization problem (the inference procedure) whose objective function is parameterized by a learned scoring model. We propose different combinations of inference procedures and scoring models able to represent coordination patterns of increasing complexity. The resulting assignment policy can be efficiently learned on small problem instances and readily reused in problems with more agents and tasks (i.e., zero-shot generalization). We report experimental results on a toy search and rescue problem and on several target selection scenarios in StarCraft: Brood War, in which our model significantly outperforms strong rule-based baselines on instances with 5 times more agents and tasks than those seen during training.

[1]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[2]  Benjamin Müller,et al.  The SCIP Optimization Suite 5.0 , 2017, 2112.08872.

[3]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[4]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Scott Sanner,et al.  Practical Linear Value-approximation Techniques for First-order MDPs , 2006, UAI.

[7]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[8]  Yuandong Tian,et al.  ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[9]  Prasad Tadepalli,et al.  Solving multiagent assignment Markov decision processes , 2009, AAMAS.

[10]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[11]  Satinder P. Singh,et al.  How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[12]  Gerald Tesauro,et al.  Online Resource Allocation Using Decompositional Reinforcement Learning , 2005, AAAI.

[13]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  Santiago Ontañón,et al.  A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[15]  Zongqing Lu,et al.  Learning Attentional Communication for Multi-Agent Cooperation , 2018, NeurIPS.

[16]  Zongqing Lu,et al.  Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation , 2018, ArXiv.

[17]  Zhe Xu,et al.  Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning , 2018, KDD.

[18]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[19]  Michael Buro,et al.  Fast Heuristic Search for RTS Game Combat Scenarios , 2012, AIIDE.

[20]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Razvan Pascanu,et al.  Relational Deep Reinforcement Learning , 2018, ArXiv.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[24]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[25]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[26]  David Churchill,et al.  An Analysis of Model-Based Heuristic Search Techniques for StarCraft Combat Scenarios , 2017, AIIDE Workshops.

[27]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[28]  Michael Buro,et al.  Portfolio greedy search and simulation for large-scale combat in starcraft , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[29]  Nicolas Usunier,et al.  Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[30]  International Foundation for Autonomous Agents and MultiAgent Systems ( IFAAMAS ) , 2007 .

[31]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[32]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.