论文信息 - Deep Coordination Graphs - 字舞流文

Deep Coordination Graphs

This paper introduces the deep coordination graph (DCG) for collaborative multi-agent reinforcement learning. DCG strikes a flexible trade-off between representational capacity and generalization by factorizing the joint value function of all agents according to a coordination graph into payoffs between pairs of agents. The value can be maximized by local message passing along the graph, which allows training of the value function end-to-end with Q-learning. Payoff functions are approximated with deep neural networks and parameter sharing improves generalization over the state-action space. We show that DCG can solve challenging predator-prey tasks that are vulnerable to the relative overgeneralization pathology and in which all other known value factorization approaches fail.

Shimon Whiteson | Wendelin Böhmer | Vitaly Kurin

[1] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[2] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[3] Gerhard Lakemeyer,et al. Exploring artificial intelligence in the new millennium , 2003 .

[4] Tiejun Huang,et al. Graph Convolutional Reinforcement Learning , 2020, ICLR.

[5] Daniel Kudenko,et al. MAGNet: Multi-agent Graph Network for Deep Multi-agent Reinforcement Learning , 2019, 2019 XVI International Symposium "Problems of Redundancy in Information and Control Systems" (REDUNDANCY).

[6] Frans A. Oliehoek,et al. Coordinated Deep Reinforcement Learners for Traffic Light Control , 2016 .

[7] Nikos A. Vlassis,et al. Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[8] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[9] Ming Zhou,et al. Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[10] Yung Yi,et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[11] Mariagrazia Dotoli,et al. Advanced control in factory automation: a survey , 2017, Int. J. Prod. Res..

[12] H. Francis Song,et al. Relational Forward Models for Multi-Agent Learning , 2018, ICLR.

[13] V. S. Glukhov,et al. Idiosyncrasies and challenges of data driven learning in electronic trading , 2018, 1811.09549.

[14] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[15] Drew Wicke,et al. Multiagent Soft Q-Learning , 2018, AAAI Spring Symposia.

[16] Shobha Venkataraman,et al. Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[17] William T. Freeman,et al. Understanding belief propagation and its generalizations , 2003 .

[18] Javier Alonso-Mora,et al. Multi-robot formation control and object transport in dynamic environments via constrained optimization , 2017, Int. J. Robotics Res..

[19] Xi Chen,et al. Learning From Demonstration in the Wild , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[20] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.

[21] R. Paul Wiegand,et al. Biasing Coevolutionary Search for Optimal Multiagent Behaviors , 2006, IEEE Transactions on Evolutionary Computation.

[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[23] Carlos M. Correa-Posada,et al. Integrated Power and Natural Gas Model for Energy Adequacy in Short-Term Operation , 2015, IEEE Transactions on Power Systems.

[24] Daphne Koller,et al. Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[25] Yujing Hu,et al. Multi-Agent Game Abstraction via Graph Attention Neural Network , 2019, AAAI.

[26] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[27] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[28] Davood Hajinezhad,et al. A Review of Cooperative Multi-Agent Deep Reinforcement Learning , 2019, ArXiv.

[29] Shimon Whiteson,et al. The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[30] Nicholas R. Jennings,et al. Bounded approximate decentralised coordination via the max-sum algorithm , 2009, Artif. Intell..

[31] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[32] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[33] Shimon Whiteson,et al. The Representational Capacity of Action-Value Networks for Multi-Agent Reinforcement Learning , 2019, AAMAS.

[34] Roie Zivan,et al. Applying max-sum to teams of mobile sensing agents , 2018, Eng. Appl. Artif. Intell..

[35] Shimon Whiteson,et al. Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning , 2019, ArXiv.

[36] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[37] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[38] Avi Pfeffer,et al. Loopy Belief Propagation as a Basis for Communication in Sensor Networks , 2002, UAI.

[39] Michael I. Jordan,et al. Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[40] Ying Wen,et al. Factorized Q-learning for large-scale multi-agent systems , 2018, DAI.

[41] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[42] M. Stanković. Multi-agent reinforcement learning , 2016 .

[43] Shimon Whiteson,et al. Multi-Agent Common Knowledge Reinforcement Learning , 2018, NeurIPS.

[44] Bikramjit Banerjee,et al. Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.

[45] Martin J. Wainwright,et al. Tree consistency and bounds on the performance of the max-product algorithm and its generalizations , 2004, Stat. Comput..

[46] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[47] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[48] Emilio Frazzoli,et al. On-demand high-capacity ride-sharing via dynamic trip-vehicle assignment , 2017, Proceedings of the National Academy of Sciences.

[49] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[50] Razvan Pascanu,et al. Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.