论文信息 - An Analysis of Model-Based Heuristic Search Techniques for StarCraft Combat Scenarios

An Analysis of Model-Based Heuristic Search Techniques for StarCraft Combat Scenarios

Real-Time Strategy games have become a popular test-bed for modern AI system due to their real-time computational constraints, complex multi-unit control problems, and imperfect information. One of the most important aspects of any RTS AI system is the efficient control of units in complex combat scenarios, also known as micromanagement. Recently, a model-based heuristic search technique called Portfolio Greedy Search (PGS) has shown promising performance for providing real-time decision making in RTS combat scenarios, but has so far only been tested in SparCraft: an RTS combat simulator. In this paper we present the first integration of PGS into the StarCraft game engine, and compare its performance to the current state-of-the-art deep reinforcement learning method in several benchmark combat scenarios. We then perform the same experiments within the SparCraft simulator in order to investigate any differences between PGS performance in the simulator and in the actual game. Lastly, we investigate how varying parameters of the SparCraft simulator affect the performance of PGS in the StarCraft game engine. We demonstrate that the performance of PGS relies heavily on the accuracy of the underlying model, outperforming other techniques only for scenarios where the SparCraft simulation model more accurately matches the StarCraft game engine. Introduction and Background AI researchers have often used games as a test-bed for evaluating the performance of their artificial intelligence systems. Recently, advances in deep learning techniques, combined with heuristic search, have led to the defeat of professional players of Go and No Limit Texas-Hold-em Poker by the AlphaGo (Silver et al. 2016) and DeepStack (Moravčík et al. 2017) programs, heralding the end of human dominance in traditional two-player games. As their next challenge, many AI researchers have chosen to tackle Real-Time Strategy (RTS) video games, which present more complex problems than traditional board games, with properties such as real-time computational constraints, simultaneous multi-unit control, and imperfect information (Ontanón et al. 2013). Unit micromanagement in RTS games (“micro”) is the problem of making decisions on how to most effectively control the specific movements and actions of units, usually in a combat-related context, and is a key aspect of competitive Copyright c © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. play. The properties of RTS game combat make it a particularly challenging problem, involving the simultaneous realtime control of dozens of units with varying properties. Each unit on the battlefield can be controlled individually, leading to an exponential number of action combination possibilities that must be chosen from by a player at a given state. A number of heuristic search based approaches for RTS combat have been introduced in recent years, such as AlphaBeta Considering Durations (ABCD), UCT Considering Durations (UCT-CD), and Portfolio Greedy Search (PGS). (Churchill and Buro 2013) demonstrated that PGS outperforms other search-basedmethods inmedium and large-scale combat scenarios, and is currently the top performing heuristic search based method for RTS game combat. As PGS is based on heuristic search, it relies on a forward model of the environment, and its only published results so far have been in simulation, not the StarCraft game engine. In this paper we will present the first experiments which use PGS to control units in the StarCraft game engine, and explore possible issues related to its reliance on an abstract simulator as a forward model. We will compare its performance to the current state-of-the-art technique in deep reinforcement learning as well as several baseline scripted players. By performing these experiments in both the real StarCraft game engine as well as in simulation, we can highlight any differences in results and attempt to draw conclusions about the feasibility of such model-based approaches in real RTS game engines. We begin by first introducing each of the StarCraft combat techniques we will use in our experiments, followed by a description of the combat scenarios used to test them. We then present the results of three separate experiments and discuss our findings. RTS Unit Micro Scripts The simplest and most common technique for unit micro in retail RTS games is to use hard-coded scripted behaviours. We define a scripted player to be one that implements a static series of scripted rules, similar to a finite state machine or simple decision tree, but does not perform any sort of forward look-ahead evaluation or learning. In this paper we will use the following common scripts as baseline players to compare the performance of PGS and a reinforcement learning policy: c AttackClosest If a unit is within the attack range of an enemy unit, it will attack the closest enemy unit and wait in the same position until it has reloaded to attack again. If it is not within attack range of any units, it will move toward the closest enemy unit. w AttackWeakest AttackClosest, but will attack the enemy unit which has the lowest current hit points. k Kiter AttackClosest, but moves away from an enemy unit while reloading instead of standing still. This behaviour is very effective when used with long ranged attackers vs. short ranged attackers and is known as ’kiting’. h HoldPosition AttackClosest, but units will never move from their initial positions n NoOverkill Adds a condition to a script that no unit will be assigned to attack an enemy unit which has already been assigned predicted lethal damage on this time step. For brevity, future references to these scripts in this paper may use the character abbreviation listed before the script name. These scripts can also have their behaviours combined, for example the scriptwcknwould be “attackweakest enemy unit with highest priority, then closest, kite when reloading, with no overkill”. Portfolio Greedy Search RTS combat game scenarios are difficult for traditional heuristic search techniques to solve, due to the exponential number of possible action combinations to choose from at any state. Portfolio Greedy Search is a heuristic search technique specifically designed for decision making in games with large action spaces. Instead of searching all possible action combinations for units at a give state, PGS reduces the action space generating unit actions from a set of scripted behaviours which we call a portfolio. Unlike search algorithms such as Minimax or MCTS, PGS does not build a game tree to search an exponential number of action combinations, but instead uses a hill-climbing approach to reduce the number of actions searched to a linear number. State evaluations in PGS are carried out via game playouts. A full description of the PGS algorithm is available in (Churchill and Buro 2013), but a brief outline is as follows: 1. PGS takes as input an initial game state and a portfolio of scripts, chosen to cover a range of tactical possibilities. 2. A single script is chosen as the initial seed script for both players, and is assigned to control each unit of the corresponding player, we call this a unit-script assignment. 3. For each unit a player controls, PGS iterates through each script in the portfolio and assigns it to the current unit. 4. A playout is performed to evaluate the current unit-script assignment, simulating the result of combat using the current unit script assignment for some number of turns. 5. The best combination of unit-script assignments is chosen as the one which has the maximum playout value. 6. Steps 3-5 can then be repeated any number of times for both players, improving via self play. 7. When a time limit is reached, the actions produced by the final unit-script assignment are returned. SparCraft Combat Simulator As PGS performs playout evaluations to decide on which actions to perform, it requires a forward model of the environment in order to function. The BWAPI programming interface allows for reading the memory of StarCraft and issuing actions to the game, but it does not allow us to directly copy or manipulate local game state instances, which is required for the PGS forward-model. Therefore, in order to perform PGS on combat scenarios in StarCraft, we must use a system which enables us to efficiently simulate the game’s combat engine. The model we will use to carry out PGS is SparCraft (Churchill 2016a), a StarCraft combat simulation library written in C++. Specifically designed to be a test-bed for RTS combat algorithms, it models StarCraft combat scenarios in a manner that balances accuracy of simulation with speed of computation, and has the following features: • It models all StarCraft unit types that are able to attack and move, along with all of their properties such as hit points, weapon damage, speed, and attack range. • Units in SparCraft are able to perform the following actions: Move to a given (x,y) pixel location, Attack a target enemy unit, and Hold Position for a given duration. • SparCraft states can be fast-forwarded to the next time step in which a unit is able to act, allowing it to skip many game frames and save significant computation. • SparCraft actions have set durations, and must be carried to completion with no interruption. For example, a unit must carry out a Move action to its destination and cannot take any other action or stop until it is completed. • SparCraft does not currently model fog of war, unit acceleration, or spell casting. • Importantly, SparCraft does not model unit collisions. This greatly reduces the accuracy of the simulation in comparison to the StarCraft game engine, but was necessary to be fast enough to facilitate heuristic search. If unit collisions were simulated, SparCraft would also require a multi-agent path-finding system similar to the StarCraft game engine, which would be computationally expensive, and difficu

David Churchill | Gabriel Synnaeve | Zeming Lin

[1] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[2] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[3] Santiago Ontañón,et al. A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[4] Kevin Waugh,et al. DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.

[5] Michael Buro,et al. Portfolio greedy search and simulation for large-scale combat in starcraft , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[6] Nicolas Usunier,et al. Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[7] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[8] Florian Richoux,et al. TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games , 2016, ArXiv.