论文信息 - StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning

StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning

Real-time strategy games have been an important field of game artificial intelligence in recent years. This paper presents a reinforcement learning and curriculum transfer learning method to control multiple units in StarCraft micromanagement. We define an efficient state representation, which breaks down the complexity caused by the large state space in the game environment. Then, a parameter sharing multi-agent gradient-descent Sarsa($\lambda$) algorithm is proposed to train the units. The learning policy is shared among our units to encourage cooperative behaviors. We use a neural network as a function approximator to estimate the action–value function, and propose a reward function to help units balance their move and attack. In addition, a transfer learning method is used to extend our model to more difficult scenarios, which accelerates the training process and improves the learning performance. In small-scale scenarios, our units successfully learn to combat and defeat the built-in AI with 100% win rates. In large-scale scenarios, the curriculum transfer learning method is used to progressively train a group of units, and it shows superior performance over some baseline methods in target scenarios. With reinforcement learning and curriculum transfer learning, our units are able to learn appropriate strategies in StarCraft micromanagement scenarios.

[1] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[2] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[3] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[4] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[5] Sergey Levine,et al. Guided Policy Search , 2013, ICML.

[6] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[7] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[8] Santiago Ontañón,et al. Kiting in RTS Games Using Influence Maps , 2012, Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.

[9] Michael Buro,et al. Incorporating Search Algorithms into RTS Game Agents , 2012 .

[10] Mykel J. Kochenderfer,et al. Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[11] Ian D. Watson,et al. Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft:Broodwar , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[12] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[13] Dongbin Zhao,et al. Self-teaching adaptive dynamic programming for Gomoku , 2012, Neurocomputing.

[14] Santiago Ontañón,et al. A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[15] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.

[16] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[17] Shaogang Gong,et al. Multi-task Curriculum Transfer Deep Learning of Clothing Attributes , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[18] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[19] Liang Feng,et al. Insights on Transfer Optimization: Because Experience is the Best Teacher , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.

[20] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[21] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[22] Johan Hagelbäck,et al. Hybrid Pathfinding in StarCraft , 2016, IEEE Transactions on Computational Intelligence and AI in Games.

[23] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[24] Qichao Zhang,et al. Event-Triggered H ∞ Control for Continuous-Time Nonlinear System , 2015, ISNN.

[25] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[26] Qichao Zhang,et al. Event-Triggered $H_\infty $ Control for Continuous-Time Nonlinear System via Concurrent Learning , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[27] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[30] Pierre Bessière,et al. Multiscale Bayesian Modeling for RTS Games: An Application to StarCraft AI , 2016, IEEE Transactions on Computational Intelligence and AI in Games.

[31] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[32] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[33] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[34] Nicolas Usunier,et al. Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[35] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[36] Tang Zhen-tao,et al. Recent progress of deep reinforcement learning : from AlphaGo to AlphaGo Zero , 2018 .

[37] Peng Peng,et al. Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[38] Daniela Zaharie,et al. Neuroevolution based multi-agent system for micromanagement in real-time strategy games , 2012, BCI '12.

[39] Yuandong Tian,et al. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.

[40] Dongbin Zhao,et al. Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[41] Haitao Wang,et al. Deep reinforcement learning with experience replay based on SARSA , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[42] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[43] Dongbin Zhao,et al. Cooperative reinforcement learning for multiple units combat in starCraft , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[44] Carlos Cotta,et al. A review of computational intelligence in RTS games , 2013, 2013 IEEE Symposium on Foundations of Computational Intelligence (FOCI).

[45] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[46] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[47] Glen Robertson,et al. A Review of Real-Time Strategy Game AI , 2014, AI Mag..

[48] Junwei Gao,et al. FMRQ—A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks , 2017, IEEE Transactions on Cybernetics.

[49] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[50] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[51] Jun Wang,et al. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[52] Ming Tan,et al. Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[53] Dongbin Zhao,et al. Move prediction in Gomoku using deep learning , 2016, 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC).

[54] Hai Tao Wang,et al. Review of deep reinforcement learning and discussions on the development of computer Go , 2016 .

[55] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[56] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[57] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[58] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[59] Chaomin Luo,et al. Policy gradient methods with Gaussian process modelling acceleration , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[60] Qichao Zhang,et al. Event-Based Robust Control for Uncertain Nonlinear Systems Using Adaptive Dynamic Programming , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[61] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[62] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[63] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[64] Marco Wiering,et al. Connectionist reinforcement learning for intelligent unit micro management in StarCraft , 2011, The 2011 International Joint Conference on Neural Networks.