Actor-Attention-Critic for Multi-Agent Reinforcement Learning

Reinforcement learning in multi-agent scenarios is important for real-world applications but presents challenges beyond those seen in single-agent settings. We present an actor-critic algorithm that trains decentralized policies in multi-agent settings, using centrally computed critics that share an attention mechanism which selects relevant information for each agent at every timestep. This attention mechanism enables more effective and scalable learning in complex multi-agent environments, when compared to recent approaches. Our approach is applicable not only to cooperative settings with shared rewards, but also individualized reward settings, including adversarial settings, as well as settings that do not provide global states, and it makes no assumptions about the action spaces of the agents. As such, it is flexible enough to be applied to most multi-agent learning problems.

[1]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[2]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[3]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[4]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[5]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[6]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[7]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[8]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[9]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[10]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[11]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[12]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[13]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[14]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[17]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[20]  Jordan L. Boyd-Graber,et al.  Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[21]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[22]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[23]  Felix A. Fischer,et al.  Hierarchical reinforcement learning in communication-mediated multiagent coordination , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[24]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[25]  Koray Kavukcuoglu,et al.  Multiple Object Recognition with Visual Attention , 2014, ICLR.

[26]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[27]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[28]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[29]  Zongqing Lu,et al.  Learning Attentional Communication for Multi-Agent Cooperation , 2018, NeurIPS.

[30]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[31]  Drew Wicke,et al.  Multiagent Soft Q-Learning , 2018, AAAI Spring Symposia.

[32]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[35]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[36]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[37]  Byoung-Tak Zhang,et al.  Multi-focus Attention Network for Efficient Deep Reinforcement Learning , 2017, AAAI Workshops.