Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games

Many artificial intelligence (AI) applications often require multiple intelligent agents to work in a collaborative effort. Efficient learning for intra-agent communication and coordination is an indispensable step towards general AI. In this paper, we take StarCraft combat game as a case study, where the task is to coordinate multiple agents as a team to defeat their enemies. To maintain a scalable yet effective communication protocol, we introduce a Multiagent Bidirectionally-Coordinated Network (BiCNet ['bIknet]) with a vectorised extension of actor-critic formulation. We show that BiCNet can handle different types of combats with arbitrary numbers of AI agents for both sides. Our analysis demonstrates that without any supervisions such as human demonstrations or labelled data, BiCNet could learn various types of advanced coordination strategies that have been commonly used by experienced game players. In our experiments, we evaluate our approach against multiple baselines under different scenarios; it shows state-of-the-art performance, and possesses potential values for large-scale real-world applications.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[3]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[4]  N. Franks,et al.  Social Evolution in Ants , 2019 .

[5]  Jfirgen Schmidhuber,et al.  A GENERAL METHOD FOR MULTI-AGENT REINFORCEMENT LEARNING IN UNRESTRICTED ENVIRONMENTS , 1996 .

[6]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[7]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[8]  John Riedl,et al.  Recommender systems in e-commerce , 1999, EC '99.

[9]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[10]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[11]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[12]  Matthijs T. J. Spaan,et al.  High level coordination of agents based on multiagent Markov decision processes with roles , 2002 .

[13]  A. ADoefaa,et al.  ? ? ? ? f ? ? ? ? ? , 2003 .

[14]  Martin Lauer,et al.  Reinforcement learning for stochastic cooperative multi-agent-systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[15]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[16]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[17]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Kagan Tumer,et al.  Analyzing and visualizing multiagent rewards in dynamic and stochastic domains , 2008, Autonomous Agents and Multi-Agent Systems.

[19]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[20]  Santiago Ontañón,et al.  A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[21]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[22]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[23]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[24]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[27]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[28]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[29]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[30]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[31]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[32]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[33]  Jordan L. Boyd-Graber,et al.  Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[34]  Florian Richoux,et al.  TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games , 2016, ArXiv.

[35]  Nicolas Usunier,et al.  Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[36]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[37]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[38]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[39]  Tuomas Sandholm,et al.  Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[40]  Jun Wang,et al.  Display Advertising with Real-Time Bidding (RTB) and Behavioural Targeting , 2016, Found. Trends Inf. Retr..

[41]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[42]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[43]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[44]  金錫俊 1990 , 1990, Literatur in der SBZ/DDR.