AIIR-MIX: Multi-Agent Reinforcement Learning Meets Attention Individual Intrinsic Reward Mixing Network

Deducing the contribution of each agent and assigning the corresponding reward to them is a crucial problem in cooperative Multi-Agent Reinforcement Learning (MARL). Previous studies try to resolve the issue through designing an intrinsic reward function, but the intrinsic reward is simply combined with the environment reward by summation in these studies, which makes the performance of their MARL framework unsatisfactory. We propose a novel method named Attention Individual Intrinsic Reward Mixing Network (AIIR-MIX) in MARL, and the contributions of AIIR-MIX are listed as follows:(a) we construct a novel intrinsic reward network based on the attention mechanism to make teamwork more effective. (b) we propose a Mixing network that is able to combine intrinsic and extrinsic rewards non-linearly and dynamically in response to changing conditions of the environment. We compare AIIR-MIX with many State-Of-The-Art (SOTA) MARL methods on battle games in StarCraft II. And the results demonstrate that AIIR-MIX performs admirably and can defeat the current advanced methods on average test win rate. To validate the effectiveness of AIIR-MIX, we conduct additional ablation studies. The results show that AIIR-MIX can dynamically assign each agent a real-time intrinsic reward in accordance with their actual contribution.

[1]  Zhuang Wang,et al.  Generating individual intrinsic reward for cooperative multiagent reinforcement learning , 2021, International Journal of Advanced Robotic Systems.

[2]  Ignacio Carlucho,et al.  An adaptive deep reinforcement learning approach for MIMO PID control of mobile robots. , 2020, ISA transactions.

[3]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[4]  Yi Wu,et al.  Influence-Based Multi-Agent Exploration , 2019, ICLR.

[5]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[6]  Jianye Hao,et al.  Independent Generative Adversarial Self-Imitation Learning in Cooperative Multiagent Systems , 2019, AAMAS.

[7]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[8]  Saeid Nahavandi,et al.  Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications , 2018, IEEE Transactions on Cybernetics.

[9]  Yu Tsao,et al.  Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Fei Sha,et al.  Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.

[11]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[12]  Zongqing Lu,et al.  Learning Attentional Communication for Multi-Agent Cooperation , 2018, NeurIPS.

[13]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[14]  Stefano Ermon,et al.  Multi-Agent Generative Adversarial Imitation Learning , 2018, NeurIPS.

[15]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[16]  Joel Z. Leibo,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning , 2017, ArXiv.

[17]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[18]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[19]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[21]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[22]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[23]  Michael L. Littman,et al.  An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[24]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[25]  Lei Han,et al.  LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning , 2019, NeurIPS.