Adaptive High-Level Strategy Learning in StarCraft

Reinforcement learning (RL) is a technique to compute an optimal policy in stochastic settings whereby, actions from an initial policy are simulated (or directly executed) and the value of a state is updated based on the immediate rewards obtained as the policy is executed. Existing efforts model opponents in competitive games as elements of a stochastic environment and use RL to learn policies against such opponents. In this setting, the rate of change for state values monotonically decreases over time, as learning converges. Although this modeling assumes that the opponent strategy is static over time, such an assumption is too strong when human opponents are possible. Consequently, in this paper, we develop a meta-level RL mechanism that detects when an opponent changes strategy and allows the state-values to “deconverge” in order to learn how to play against a different strategy. We validate this approach empirically for high-level strategy selection in the Starcraft: Brood War game.

[1]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[2]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[3]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[4]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[5]  Kenji Doya,et al.  Meta-learning in Reinforcement Learning , 2003, Neural Networks.

[6]  R. Sutton,et al.  Reinforcement learning in board games , 2004 .

[7]  Jeffrey O. Kephart,et al.  Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.

[8]  Ricardo Vilalta,et al.  Using Meta-Learning to Support Data Mining , 2004, Int. J. Comput. Sci. Appl..

[9]  Thore Graepel,et al.  LEARNING TO FIGHT , 2004 .

[10]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Michael T. Cox Metareasoning: A manifesto , 2007 .

[13]  Ashok K. Goel,et al.  Combining Model-Based Meta-Reasoning and Reinforcement Learning for Adapting Game-Playing Agents , 2008, AIIDE.

[14]  Ryszard Kowalczyk,et al.  Dynamic analysis of multiagent Q-learning with ε-greedy exploration , 2009, ICML '09.

[15]  Guy Shani,et al.  High-level reinforcement learning in strategy games , 2010, AAMAS.

[16]  John E. Laird,et al.  Relational Reinforcement Learning in Infinite Mario , 2010, AAAI.

[17]  Michael Buro,et al.  Build Order Optimization in StarCraft , 2011, AIIDE.

[18]  Matthew E. Taylor Teaching Reinforcement Learning with Mario: An Argument and Case Study , 2011, EAAI.

[19]  Johan Hagelbäck,et al.  Potential-field based navigation in StarCraft , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[20]  Michael Buro,et al.  Real-Time Strategy Game Competitions , 2012, AI Mag..