论文信息 - R-MADDPG for Partially Observable Environments and Limited Communication - 字舞流文

R-MADDPG for Partially Observable Environments and Limited Communication

There are several real-world tasks that would benefit from applying multiagent reinforcement learning (MARL) algorithms, including the coordination among self-driving cars. The real world has challenging conditions for multiagent learning systems, such as its partial observable and nonstationary nature. Moreover, if agents must share a limited resource (e.g. network bandwidth) they must all learn how to coordinate resource use. This paper introduces a deep recurrent multiagent actor-critic framework (R-MADDPG) for handling multiagent coordination under partial observable set-tings and limited communication. We investigate recurrency effects on performance and communication use of a team of agents. We demonstrate that the resulting framework learns time dependencies for sharing missing observations, handling resource limitations, and developing different communication patterns among agents.

Jonathan P. How | Rose E. Wang | Michael Everett | J. How | M. Everett

[1] Jonathan P. How,et al. Planning for decentralized control of multiple robots under uncertainty , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4] Guy Lever,et al. Emergent Coordination Through Competition , 2019, ICLR.

[5] Peng Peng,et al. Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[6] Feng Wu,et al. Multi-Agent Online Planning with Communication , 2009, ICAPS.

[7] Nando de Freitas. Learning to Learn and Compositionality with Deep Recurrent Neural Networks: Learning to Learn and Compositionality , 2016, KDD '16.

[8] Shimon Whiteson,et al. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks , 2016, ArXiv.

[9] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[10] Zongqing Lu,et al. Learning Attentional Communication for Multi-Agent Cooperation , 2018, NeurIPS.

[11] Pablo Hernandez-Leal,et al. A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.

[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[13] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[14] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[15] Jonathan P. How,et al. Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[16] Vijay Kumar,et al. Scalable Centralized Deep Multi-Agent Reinforcement Learning via Policy Gradients , 2018, ArXiv.

[17] Yizhou Wang,et al. Revisiting the Master-Slave Architecture in Multi-Agent Deep Reinforcement Learning , 2017, ArXiv.

[18] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[19] Amanpreet Singh,et al. Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks , 2018, ICLR.

[20] Pieter Abbeel,et al. Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[21] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[22] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[23] José M. F. Moura,et al. Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[24] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.