论文信息 - Connectionist reinforcement learning for intelligent unit micro management in StarCraft

Connectionist reinforcement learning for intelligent unit micro management in StarCraft

Real Time Strategy Games are one of the most popular game schemes in PC markets and offer a dynamic environment that involves several interacting agents. The core strategies that need to be developed in these games are unit micro management, building order, resource management, and the game main tactic. Unfortunately, current games only use scripted and fixed behaviors for their artificial intelligence (AI), and the player can easily learn the counter measures to defeat the AI. In this paper, we describe a system based on neural networks that controls a set of units of the same type in the popular game StarCraft. Using the neural networks, the units will either choose a unit to attack or evade from the battlefield. The system uses reinforcement learning combined with neural networks using online Sarsa and neural-fitted Sarsa, both with a short term memory reward function. We also present an incremental learning method for training the units for larger scenarios involving more units using trained neural networks on smaller scenarios. Additionally, we developed a novel sensing system to feed the environment data to the neural networks using separate vision grids. The simulation results show superior performance against the human-made AI scripts in StarCraft.

[1] Ethem Alpaydin,et al. Introduction to machine learning , 2004, Adaptive computation and machine learning.

[2] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[6] Paul Scerri,et al. Using Dynamic Neural Network to Model Team Performance for Coordination Algorithm Configuration and Reconfiguration of Large Multi-Agent Teams , 2006 .

[7] M. R. Khojasteh,et al. Agent Coordination and Disaster Prediction in Persia 2007 , A RoboCup Rescue Simulation Team based on Learning Automata , 2022 .

[8] Ivette C. Martínez,et al. Ambulance Decision Support Using Evolutionary Reinforcement Learning in Robocup Rescue Simulation League , 2006, RoboCup.

[9] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .

[10] H. Jaap van den Herik,et al. Opponent modelling for case-based adaptive game AI , 2009, Entertain. Comput..

[11] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[12] Pieter Spronck,et al. Dynamic formations in real-time strategy games , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[13] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[14] Abolfazl Toroghi Haghighat,et al. Implementing Parametric Reinforcement Learning in Robocup Rescue Simulation , 2007, RoboCup.

[15] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[16] Luigi Barone,et al. Using NEAT for continuous adaptation and teamwork formation in Pacman , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[17] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[18] Archie C. Chapman,et al. Decentralised dynamic task allocation: a practical game: theoretic approach , 2009, AAMAS.