Deep Coordination Graphs

This paper introduces the deep coordination graph (DCG) for collaborative multi-agent reinforcement learning. DCG strikes a flexible trade-off between representational capacity and generalization by factorizing the joint value function of all agents according to a coordination graph into payoffs between pairs of agents. The value can be maximized by local message passing along the graph, which allows training of the value function end-to-end with Q-learning. Payoff functions are approximated with deep neural networks and parameter sharing improves generalization over the state-action space. We show that DCG can solve challenging predator-prey tasks that are vulnerable to the relative overgeneralization pathology and in which all other known value factorization approaches fail.

[1]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[2]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[3]  Gerhard Lakemeyer,et al.  Exploring artificial intelligence in the new millennium , 2003 .

[4]  Tiejun Huang,et al.  Graph Convolutional Reinforcement Learning , 2020, ICLR.

[5]  Daniel Kudenko,et al.  MAGNet: Multi-agent Graph Network for Deep Multi-agent Reinforcement Learning , 2019, 2019 XVI International Symposium "Problems of Redundancy in Information and Control Systems" (REDUNDANCY).

[6]  Frans A. Oliehoek,et al.  Coordinated Deep Reinforcement Learners for Traffic Light Control , 2016 .

[7]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[8]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[9]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[10]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[11]  Mariagrazia Dotoli,et al.  Advanced control in factory automation: a survey , 2017, Int. J. Prod. Res..

[12]  H. Francis Song,et al.  Relational Forward Models for Multi-Agent Learning , 2018, ICLR.

[13]  V. S. Glukhov,et al.  Idiosyncrasies and challenges of data driven learning in electronic trading , 2018, 1811.09549.

[14]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[15]  Drew Wicke,et al.  Multiagent Soft Q-Learning , 2018, AAAI Spring Symposia.

[16]  Shobha Venkataraman,et al.  Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[17]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[18]  Javier Alonso-Mora,et al.  Multi-robot formation control and object transport in dynamic environments via constrained optimization , 2017, Int. J. Robotics Res..

[19]  Xi Chen,et al.  Learning From Demonstration in the Wild , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[20]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[21]  R. Paul Wiegand,et al.  Biasing Coevolutionary Search for Optimal Multiagent Behaviors , 2006, IEEE Transactions on Evolutionary Computation.

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  Carlos M. Correa-Posada,et al.  Integrated Power and Natural Gas Model for Energy Adequacy in Short-Term Operation , 2015, IEEE Transactions on Power Systems.

[24]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[25]  Yujing Hu,et al.  Multi-Agent Game Abstraction via Graph Attention Neural Network , 2019, AAAI.

[26]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[27]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[28]  Davood Hajinezhad,et al.  A Review of Cooperative Multi-Agent Deep Reinforcement Learning , 2019, ArXiv.

[29]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[30]  Nicholas R. Jennings,et al.  Bounded approximate decentralised coordination via the max-sum algorithm , 2009, Artif. Intell..

[31]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[32]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[33]  Shimon Whiteson,et al.  The Representational Capacity of Action-Value Networks for Multi-Agent Reinforcement Learning , 2019, AAMAS.

[34]  Roie Zivan,et al.  Applying max-sum to teams of mobile sensing agents , 2018, Eng. Appl. Artif. Intell..

[35]  Shimon Whiteson,et al.  Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning , 2019, ArXiv.

[36]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[37]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[38]  Avi Pfeffer,et al.  Loopy Belief Propagation as a Basis for Communication in Sensor Networks , 2002, UAI.

[39]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[40]  Ying Wen,et al.  Factorized Q-learning for large-scale multi-agent systems , 2018, DAI.

[41]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[42]  M. Stanković Multi-agent reinforcement learning , 2016 .

[43]  Shimon Whiteson,et al.  Multi-Agent Common Knowledge Reinforcement Learning , 2018, NeurIPS.

[44]  Bikramjit Banerjee,et al.  Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.

[45]  Martin J. Wainwright,et al.  Tree consistency and bounds on the performance of the max-product algorithm and its generalizations , 2004, Stat. Comput..

[46]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[47]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[48]  Emilio Frazzoli,et al.  On-demand high-capacity ride-sharing via dynamic trip-vehicle assignment , 2017, Proceedings of the National Academy of Sciences.

[49]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[50]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.