论文信息 - TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game

TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game

Starcraft II (SC2) is widely considered as the most challenging Real Time Strategy (RTS) game. The underlying challenges include a large observation space, a huge (continuous and infinite) action space, partial observations, simultaneous move for all players, and long horizon delayed rewards for local decisions. To push the frontier of AI research, Deepmind and Blizzard jointly developed the StarCraft II Learning Environment (SC2LE) as a testbench of complex decision making systems. SC2LE provides a few mini games such as MoveToBeacon, CollectMineralShards, and DefeatRoaches, where some AI agents have achieved the performance level of human professional players. However, for full games, the current AI agents are still far from achieving human professional level performance. To bridge this gap, we present two full game AI agents in this paper - the AI agent TStarBot1 is based on deep reinforcement learning over a flat action structure, and the AI agent TStarBot2 is based on hard-coded rules over a hierarchical action structure. Both TStarBot1 and TStarBot2 are able to defeat the built-in AI agents from level 1 to level 10 in a full game (1v1 Zerg-vs-Zerg game on the AbyssalReef map), noting that level 8, level 9, and level 10 are cheating agents with unfair advantages such as full vision on the whole map and resource harvest boosting. To the best of our knowledge, this is the first public work to investigate AI agents that can defeat the built-in AI in the StarCraft II full game.

[1] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[3] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[4] Ian Millington,et al. Artificial Intelligence for Games , 2006, The Morgan Kaufmann series in interactive 3D technology.

[5] Anthony Brabazon,et al. Evolving Behaviour Trees for the Mario AI Competition Using Grammatical Evolution , 2011, EvoApplications.

[6] Santiago Ontañón,et al. A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[7] Petter Ögren,et al. Towards a unified behavior trees framework for robot control , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[8] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[10] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[11] Alex Graves,et al. Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[12] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[13] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[14] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[15] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[16] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[17] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[18] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[19] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[20] Nicolas Usunier,et al. Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[21] David Churchill. Heuristic Search Techniques for Real-Time Strategy Games , 2016 .

[22] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[23] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[24] Matthew E. Taylor,et al. Autonomous Extracting a Hierarchical Structure of Tasks in Reinforcement Learning and Multi-task Reinforcement Learning , 2017, ArXiv.

[25] Romain Laroche,et al. Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.

[26] Yuandong Tian,et al. ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[27] Tom Schaul,et al. StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[28] Peng Peng,et al. Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[29] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[30] Yuandong Tian,et al. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.

[31] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[32] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[33] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[34] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[35] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.

[36] Razvan Pascanu,et al. Relational Deep Reinforcement Learning , 2018, ArXiv.