Multi-Agent Feature Learning and Integration for Mixed Cooperative and Competitive Environment

At present, most of the centralized training with decentralized execution (CTDE) multi-agent reinforcement learning (MARL) algorithms have good results in the research of homogeneous scenarios. Heterogeneous multi-agent scenarios with different roles, cooperation modeling and credit assignment problems lead difficulty to learn effective collective strategies. In this paper, we propose a method of feature learning and feature integration about cooperation. Specifically, in the aspect of feature learning, through graph attention network, the relationship between agents is simplified to graph adjacency matrix representation, so that their feature vectors have relationship attributes. At the same time, for feature integration, we use batch normalization (BN) method to concatenate trained feature. We expect that agent relations can be modeled by end-to-end design. Meanwhile, attention mechanism can enhance the communication between interrelated agents. Through the experiments, our method has a significant result on improving the cooperative-competitive scenario of heterogeneous multi-agent. Moreover, we can visualize the output to analyze the reasonable collaborative and emphases attack policy.

[1]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[2]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[3]  Zongqing Lu,et al.  Learning Attentional Communication for Multi-Agent Cooperation , 2018, NeurIPS.

[4]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[5]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[6]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[7]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8]  Xuejun Yang,et al.  Energy-efficient joint communication-motion planning for relay-assisted wireless robot surveillance , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[9]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[10]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[11]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[12]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[13]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[14]  H. Francis Song,et al.  Relational Forward Models for Multi-Agent Learning , 2018, ICLR.

[15]  Yujing Hu,et al.  Multi-Agent Game Abstraction via Graph Attention Neural Network , 2019, AAAI.

[16]  Qing Wang,et al.  Exponentially Weighted Imitation Learning for Batched Historical Data , 2018, NeurIPS.

[17]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[18]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[19]  Lei Han,et al.  LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning , 2019, NeurIPS.

[20]  Nando de Freitas,et al.  Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.

[21]  Daniel Kudenko,et al.  Deep Multi-Agent Reinforcement Learning with Relevance Graphs , 2018, ArXiv.

[22]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[25]  Chongjie Zhang,et al.  Learning Nearly Decomposable Value Functions Via Communication Minimization , 2019, ICLR.

[26]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[27]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[28]  Marco Wiering,et al.  Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .