Dual Self-Awareness Value Decomposition Framework without Individual Global Max for Cooperative Multi-Agent Reinforcement Learning

Value decomposition methods have gradually be-come popular in the cooperative multi-agent reinforcement learning field. However, almost all value decomposition methods follow the Individual Global Max (IGM) principle or its variants, which restricts the range of issues that value decomposition methods can resolve. Inspired by the notion of dual self-awareness in psychology, we propose a dual self-awareness value decomposition framework that entirely rejects the IGM premise. Each agent consists of an ego policy that carries out actions and an alter ego value function that takes part in credit assignment. The value function factorization can ignore the IGM assumption by using an explicit search procedure. We also suggest a novel anti-ego exploration mechanism to avoid the algorithm becoming stuck in a local optimum. As the first fully IGM-free value decomposition method, our proposed framework achieves desirable performance in various cooperative tasks.

[1]  Chun-Yi Lee,et al.  DFAC Framework: Factorizing the Value Function via Quantile Mixture for Multi-Agent Distributional Q-Learning , 2021, ICML.

[2]  Chongjie Zhang,et al.  QPLEX: Duplex Dueling Multi-Agent Q-Learning , 2020, ICLR.

[3]  Shimon Whiteson,et al.  Weighted QMIX: Expanding Monotonic Value Function Factorisation , 2020, NeurIPS.

[4]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[5]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[6]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[7]  Fei Sha,et al.  Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.

[8]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[9]  Joel Z. Leibo,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning , 2017, ArXiv.

[10]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[11]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[12]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[13]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[14]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[15]  P. Pfeifer,et al.  A Brief Primer on Probability Distributions , 2008, SSRN Electronic Journal.

[16]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[17]  Dorit S. Hochba,et al.  Approximation Algorithms for NP-Hard Problems , 1997, SIGA.

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  S. Duval,et al.  A theory of objective self awareness , 1972 .

[20]  S. Freud The Standard Edition of the Complete Psychological Works of Sigmund Freud , 1953 .