论文信息 - On Reinforcement Learning for Full-length Game of StarCraft

On Reinforcement Learning for Full-length Game of StarCraft

StarCraft II poses a grand challenge for reinforcement learning. The main difficulties include huge state space, varying action space, long horizon, etc. In this paper, we investigate a set of techniques of reinforcement learning for the full-length game of StarCraft II. We investigate a hierarchical approach, where the hierarchy involves two levels of abstraction. One is the macro-actions extracted from expert’s demonstration trajectories, which can reduce the action space in an order of magnitude yet remain effective. The other is a two-layer hierarchical architecture, which is modular and easy to scale. We also investigate a curriculum transfer learning approach that trains the agent from the simplest opponent to harder ones. On a 64×64 map and using restrictive units, we train the agent on a single machine with 4 GPUs and 48 CPU threads. We achieve a winning rate of more than 99% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat model, we can achieve over 93% winning rate against the most difficult noncheating built-in AI (level-7) within days. We hope this study could shed some light on the future research of large-scale reinforcement learning.

[1] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[2] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.

[3] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[4] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[5] Razvan Pascanu,et al. Relational Deep Reinforcement Learning , 2018, ArXiv.

[6] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[7] Nicolas Usunier,et al. Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[8] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[9] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[10] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[11] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[12] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .

[13] Bo Li,et al. TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game , 2018, ArXiv.

[14] Jun Wang,et al. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[15] Tom Schaul,et al. StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[16] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[17] Xifeng Yan,et al. CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[18] Amnon Shashua,et al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[19] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[20] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[21] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[22] Sebastian Risi,et al. Learning macromanagement in starcraft from replays using deep learning , 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG).

[23] Santiago Ontañón,et al. A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[24] Yuandong Tian,et al. ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[25] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.