Extending World Models for Multi-Agent Reinforcement Learning in MALMÖ

Recent work in (deep) reinforcement learning has increasingly looked to develop better agents for multi-agent/multitask scenarios as many successes have already been seen in the usual single-task single-agent setting. In this paper, we propose a solution for a recently released benchmark which tests agents in such scenarios, namely the MARLÖ competition. Following the 2018 Jeju Deep Learning Camp, we consider a combined approach based on various ideas generated during the camp as well as suggestions for building agents from recent research trends, similar to the methodology taken in developing Rainbow (Hessel et al. 2017). These choices include the following: using model-based agents which allows for planning/simulation and reduces computation costs when learning controllers, applying distributional reinforcement learning to reduce losses incurred from using mean estimators, considering curriculum learning for task selection when tasks differ in difficulty, and graph neural networks as an approach to communicate between agents. In this paper, we motivate each of these approaches and discuss a combined approach that we believe will fare well in the competition.

[1]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[2]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[3]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[4]  Marc G. Bellemare,et al.  As Expected ? An Analysis of Distributional Reinforcement Learning , 2018 .

[5]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[6]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[7]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[8]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[9]  Masashi Sugiyama,et al.  Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.

[10]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[11]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Sanja Fidler,et al.  NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.

[14]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[15]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[16]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[17]  Fabio Viola,et al.  Learning and Querying Fast Generative Models for Reinforcement Learning , 2018, ArXiv.

[18]  Bo An,et al.  HogRider: Champion Agent of Microsoft Malmo Collaborative AI Challenge , 2018, AAAI.

[19]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.