Modular Architecture for StarCraft II with Deep Reinforcement Learning

We present a novel modular architecture for StarCraft II AI. The architecture splits responsibilities between multiple modules that each control one aspect of the game, such as build-order selection or tactics. A centralized scheduler reviews macros suggested by all modules and decides their order of execution. An updater keeps track of environment changes and instantiates macros into series of executable actions. Modules in this framework can be optimized independently or jointly via human design, planning, or reinforcement learning. We apply deep reinforcement learning techniques to training two out of six modules of a modular agent with self-play, achieving 94% or 87% win rates against the "Harder" (level 5) built-in Blizzard bot in Zerg vs. Zerg matches, with or without fog-of-war.

[1]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[2]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[3]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[4]  David W. Aha,et al.  Learning to Win: Case-Based Plan Selection in a Real-Time Strategy Game , 2005, Künstliche Intell..

[5]  Bo Li,et al.  TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game , 2018, ArXiv.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Dongbin Zhao,et al.  StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.

[8]  Santiago Ontañón,et al.  A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[9]  Santiago Ontañón,et al.  Improving Monte Carlo Tree Search Policies in StarCraft via Probabilistic Models Learned from Replay Data , 2021, AIIDE.

[10]  Glen Robertson,et al.  A Review of Real-Time Strategy Game AI , 2014, AI Mag..

[11]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[12]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[13]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[14]  Sebastian Risi,et al.  Learning macromanagement in starcraft from replays using deep learning , 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG).

[15]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[16]  Guillaume J. Laurent,et al.  Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[18]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[19]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[20]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[21]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[22]  Arnav Jhala,et al.  Applying Goal-Driven Autonomy to StarCraft , 2010, AIIDE.

[23]  Nicolas Usunier,et al.  Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[24]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.