论文信息 - Coordination Between Individual Agents in Multi-Agent Reinforcement Learning

Coordination Between Individual Agents in Multi-Agent Reinforcement Learning

The existing multi-agent reinforcement learning methods (MARL) for determining the coordination between agents focus on either global-level or neighborhood-level coordination between agents. However the problem of coordination between individual agents is remain to be solved. It is crucial for learning an optimal coordinated policy in unknown multi-agent environments to analyze the agent’s roles and the correlation between individual agents. To this end, in this paper we propose an agent-level coordination based MARL method. Specifically, it includes two parts in our method. The first is correlation analysis between individual agents based on the Pearson, Spearman, and Kendall correlation coefficients; And the second is an agent-level coordinated training framework where the communication message between weakly correlated agents is dropped out, and a correlation based reward function is built. The proposed method is verified in four mixed cooperative-competitive environments. The experimental results show that the proposed method outperforms the state-of-the-art MARL methods and can measure the correlation between individual agents accurately. Introduction In recent years, the Multi-Agent Reinforcement Learning (MARL) has gained more and more attention with the development of single-agent reinforcement learning (Sutton and Barto 2018), deep learning (LeCun, Bengio, and Hinton 2015), and multi-agent systems (Wooldridge 2009). Many successful single-agent reinforcement learning methods, including the DQN (Mnih et al. 2015) and DDPG (Lillicrap et al. 2015), have been extended to the multi-agent systems. However, simple extending of the single-agent reinforcement learning methods to the multi-agents environment has faced big challenges, i.e., the coordination between agents (Hernandez-Leal, Kartal, and Taylor 2019). Based on the coordination structure between agents, existing studies on the MARL coordination algorithms can be divided into two groups: global-level coordination based methods (Sunehag et al. 2018; Rashid et al. 2018; Son et al. 2019; Wen et al. 2020), and neighborhood-level coordination based methods (Yang et al. 2018; Ganapathi Subra*Corresponding author. Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. manian et al. 2020). In the global-level coordination based methods, all agents share observations and actions with each other and a virtual agent is integrated to learn a centralized value function for all agents. In contrast, in the neighborhood-level coordination based methods, the coordination exists only in the neighborhood of agents, meaning that agents share their actions and observations only with their neighbors. In the training process, agents in the same neighborhood are integrated into a virtual agent. Both global-level and neighborhood-level methods attempt to solve the problem of coordination between agents as a whole, while the coordination between individual agents is ignored. In many real scenarios, different agents play different roles in the environment, and they cannot be simply integrated into a virtual agent. For example, in soccer game, a forward player is in a cooperative relationship with his teammates, and they all have the same aim to kick the ball into the other team’s goal; In addition, the forward player is in a competitive relationship with the opponent players who try to prevent him from the goal. The two teams of players cannot be analyzed as an agent because the aim of them are completely opposite. In the same team, the forward player also has different correlations with others. Namely, he has a stronger cooperative relationship with players who are mainly involved in the offense, such as centers, but a weaker cooperative relationship with others who are mainly responsible for defense, such as guards. Thus, it is important to analyze the correlation between individual agents for learning a coordinated strategy. The three coordination structures of MARL methods (global-level, neighborhood-level, and agent-level based methods) are illustrated in Figure 1. Unfortunately, in unknown environments, the correlation between individual agents is usually not given. During the interaction with the environment, the information an agent obtains directly are the state of environment, joint action, and received rewards. Based on the change in the reward of different agents, we can analyze the correlation between individual agents preliminarily. If the rewards of two agents increase or decrease simultaneously, the two agents are much more likely in a cooperative relationship. Conversely, if the reward of one agent increases while the reward of other agent decreases, the two agents are much more likely in a competitive relationship. However, this method is too simple to identify the correlation between agents accurately. The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

[1] Yung Yi,et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[2] Ming Zhou,et al. Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[3] Gerhard Nahler,et al. Pearson Correlation Coefficient , 2020, Definitions.

[4] L. Myers,et al. Spearman Correlation Coefficients, Differences between , 2004 .

[5] Pieter Abbeel,et al. Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[6] Matthew E. Taylor,et al. A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[7] Yi Wu,et al. Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient , 2019, AAAI.

[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[9] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[10] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[11] Chao Wen,et al. SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multiagent Reinforcement Learning , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[12] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[13] H. Abdi. The Kendall Rank Correlation Coefficient , 2007 .

[14] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[15] Taeyoung Lee,et al. Learning to Schedule Communication in Multi-agent Reinforcement Learning , 2019, ICLR.

[16] Youngchul Sung,et al. Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning , 2019, AAAI.

[17] Neil Salkind. Encyclopedia of Measurement and Statistics , 2006 .

[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19] Aditya Mahajan,et al. Reinforcement Learning in Stationary Mean-field Games , 2019, AAMAS.

[20] Zhuoran Yang,et al. Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning , 2020, ICML.

[21] Pascal Poupart,et al. Multi Type Mean Field Reinforcement Learning , 2020, AAMAS.

[22] Barbara Messing,et al. An Introduction to MultiAgent Systems , 2002, Künstliche Intell..

[23] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[24] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[25] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.