Comparing Observation and Action Representations for Deep Reinforcement Learning in MicroRTS

This paper presents a preliminary study comparing different observation and action space representations for Deep Reinforcement Learning (DRL) in the context of Real-time Strategy (RTS) games. Specifically, we compare two representations: (1) a global representation where the observation represents the whole game state, and the RL agent needs to choose which unit to issue actions to, and which actions to execute; and (2) a local representation where the observation is represented from the point of view of an individual unit, and the RL agent picks actions for each unit independently. We evaluate these representations in $\mu$RTS showing that the local representation seems to outperform the global representation when training agents with the task of harvesting resources.

[1]  Michael Buro,et al.  Evaluating real-time strategy game states using convolutional neural networks , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[2]  Bhaskara Marthi,et al.  Concurrent Hierarchical Reinforcement Learning , 2005, IJCAI.

[3]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[4]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[5]  Bo Li,et al.  TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game , 2018, ArXiv.

[6]  Michael Buro,et al.  Real-Time Strategy Games: A New AI Research Challenge , 2003, IJCAI.

[7]  Santiago Ontañón,et al.  A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[8]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Trevor Darrell,et al.  Modular Architecture for StarCraft II with Deep Reinforcement Learning , 2018, AIIDE.

[10]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[11]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[13]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[14]  Yuandong Tian,et al.  ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[15]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[16]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[17]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[18]  Tong Lu,et al.  Efficient Reinforcement Learning with a Thought-Game for StarCraft , 2019 .

[19]  Hector Muñoz-Avila,et al.  Modeling Unit Classes as Agents in Real-Time Strategy Games , 2013, AIIDE.

[20]  Luiz Chaimowicz,et al.  Tabular Reinforcement Learning in Real-Time Strategy Games via Options , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[21]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.