SA-MATD3: Self-attention-based multi-agent continuous control method in cooperative environments

Cooperative problems under continuous control have always been the focus of multi-agent reinforcement learning. Existing algorithms suffer from the problem of uneven learning degree with the increase of the number of agents. In this paper, a new structure for a multi-agent actor critic is proposed, and the self-attention mechanism is applied in the critic network and the value decomposition method used to solve the uneven problem. The proposed algorithm makes full use of the samples in the replay memory buffer to learn the behavior of a class of agents. First, a new update method is proposed for policy networks that promotes learning efficiency. Second, the utilization of samples is improved, at the same time reflecting the ability of perspectivetaking among groups. Finally, the "deceptive signal" in training is eliminated and the learning degree among agents is more uniform than in the existing methods. Multiple experiments were conducted in two typical scenarios of a multi-agent particle environment. Experimental results show that the proposed algorithm can perform better than the state-of-the-art ones, and that it exhibits higher learning efficiency with an increasing number of agents.

[1]  Yu Wang,et al.  The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games , 2021, NeurIPS.

[2]  P. Alam ‘S’ , 2021, Composites Engineering: An A–Z Guide.

[3]  P. Alam ‘U’ , 2021, Composites Engineering: An A–Z Guide.

[4]  Zhen Xiao,et al.  Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG , 2018, AAMAS.

[5]  Zhu Han,et al.  UAV-Enabled Secure Communications by Multi-Agent Deep Reinforcement Learning , 2020, IEEE Transactions on Vehicular Technology.

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  Di Cao,et al.  An Imbalance Fault Detection Algorithm for Variable-Speed Wind Turbines: A Deep Learning Approach , 2019, Energies.

[8]  Shimon Whiteson,et al.  Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2020, J. Mach. Learn. Res..

[9]  장윤희,et al.  Y. , 2003, Industrial and Labor Relations Terms.

[10]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[11]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[12]  P. Alam,et al.  H , 1887, High Explosives, Propellants, Pyrotechnics.

[13]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[14]  Guanyu Zhang,et al.  Multiagent Reinforcement Learning for Swarm Confrontation Environments , 2019, ICIRA.

[15]  Shalabh Bhatnagar,et al.  Attention Actor-Critic algorithm for Multi-Agent Constrained Co-operative Reinforcement Learning , 2021, AAMAS.

[16]  Pei Xu,et al.  Distributed Non-Communicating Multi-Robot Collision Avoidance via Map-Based Deep Reinforcement Learning , 2020, Sensors.

[17]  Hangyu Mao,et al.  Learning multi-agent communication with double attentional deep reinforcement learning , 2020, Autonomous Agents and Multi-Agent Systems.

[18]  P. Alam ‘N’ , 2021, Composites Engineering: An A–Z Guide.

[19]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[20]  Weijia Yao,et al.  Multi-Robot Flocking Control Based on Deep Reinforcement Learning , 2020, IEEE Access.

[21]  Thomas de Quincey [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[22]  Doina Precup,et al.  Reward is enough , 2021, Artif. Intell..

[23]  Gorjan Alagic,et al.  #p , 2019, Quantum information & computation.

[24]  P. Alam ‘G’ , 2021, Composites Engineering: An A–Z Guide.

[25]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[26]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[29]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[30]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[31]  P. Alam ‘T’ , 2021, Composites Engineering: An A–Z Guide.

[32]  Taeyoung Lee,et al.  Learning to Schedule Communication in Multi-agent Reinforcement Learning , 2019, ICLR.

[33]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[34]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[35]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[36]  Jie Li,et al.  A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment , 2020, Neurocomputing.

[37]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[38]  Fei Sha,et al.  Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.

[39]  Zhe Chen,et al.  A data-driven approach for designing STATCOM additional damping controller for wind farms , 2020 .

[40]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[41]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.