Spatial Action Decomposition Learning Applied to RTS Combat Games

Learning good policies for multi-agent systems is a complex task. Existing methods are often limited to a small number of agents, as learning becomes intractable when the agent number increases considerably. In this paper we describe Spatial Action Decomposition Learning that tries to overcome inefficiencies of standard multi-agent Q-learning methods by exploiting existing spatial action correlations. We apply our method to real-time strategy (RTS) game combat scenarios and show that Spatial Action Decomposition Learning based systems can outperform handcrafted scripts and policies optimized by independent Q-learning.

[1]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[2]  Weinan Zhang,et al.  MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence , 2017, AAAI.

[3]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[4]  Kagan Tumer,et al.  Counterfactual Exploration for Improving Multiagent Learning , 2015, AAMAS.

[5]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[7]  Julian Togelius,et al.  Portfolio Online Evolution in StarCraft , 2016, AIIDE.

[8]  Sam Devlin,et al.  Potential-based reward shaping for finite horizon online POMDP planning , 2015, Autonomous Agents and Multi-Agent Systems.

[9]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[10]  N. Le Fort-Piat,et al.  The world of independent learners is not markovian , 2011, Int. J. Knowl. Based Intell. Eng. Syst..

[11]  Emil Gustavsson,et al.  Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence , 2016, ArXiv.

[12]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[13]  Santiago Ontañón,et al.  Combinatorial Multi-armed Bandits for Real-Time Strategy Games , 2017, J. Artif. Intell. Res..

[14]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[15]  Michael Buro,et al.  Portfolio greedy search and simulation for large-scale combat in starcraft , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[16]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[17]  Sam Devlin,et al.  Potential-based difference rewards for multiagent reinforcement learning , 2014, AAMAS.

[18]  Michael Buro,et al.  Fast Heuristic Search for RTS Game Combat Scenarios , 2012, AIIDE.

[19]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[20]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[21]  Nicolas Usunier,et al.  Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[22]  Levi Lelis Stratified Strategy Selection for Unit Control in Real-Time Strategy Games , 2017, IJCAI.

[23]  Francesco Visin,et al.  A guide to convolution arithmetic for deep learning , 2016, ArXiv.

[24]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[25]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[26]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[27]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[28]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[29]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[30]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[31]  Matthew J. Hausknecht,et al.  Cooperation and communication in multiagent deep reinforcement learning , 2016 .

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[34]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.