Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

In recent years, multi-agent reinforcement learning has been applied in many fields, such as urban traffic control, autonomous UAV operations, etc. Although the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm has been used in various simulation environments as a classic reinforcement algorithm, its training efficiency is low and the convergence speed is slow due to its original experience playback mechanism and network structure. The random experience replay mechanism adopted by the algorithm breaks the time series correlation between data samples. However, the experience replay mechanism does not take advantage of important samples. Therefore, the paper proposes a Multi-Agent Deep Deterministic Policy Gradient method based on classification experience replay, which modifies the traditional random experience replay into classification experience replay. Classified storage can make full use of important samples. At the same time, the Critic network and the Actor network are updated asynchronously, and the learned better Critic network is used to guide the Actor network update. Finally, to verify the effectiveness of the proposed algorithm, the improved algorithm is compared with the traditional MADDPG method in a simulation environment.

[1]  Yu He,et al.  Energy-aware scheduling for dependent tasks in heterogeneous multiprocessor systems , 2022, J. Syst. Archit..

[2]  Tao You,et al.  Coverage path planning of heterogeneous unmanned aerial vehicles based on ant colony system , 2021, Swarm Evol. Comput..

[3]  Xutao Chen,et al.  Computing Offloading Decision Based on DDPG Algorithm in Mobile Edge Computing , 2021, 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA).

[4]  Hak-Man Kim,et al.  Double Deep $Q$ -Learning-Based Distributed Operation of Battery Energy Storage System Considering Uncertainties , 2020, IEEE Transactions on Smart Grid.

[5]  Howard Schwartz,et al.  An Object Oriented Approach to Fuzzy Actor-Critic Learning for Multi-Agent Differential Games , 2019, 2019 IEEE Symposium Series on Computational Intelligence (SSCI).

[6]  Ausif Mahmood,et al.  Review of Deep Learning Algorithms and Architectures , 2019, IEEE Access.

[7]  Chunlin Chen,et al.  A novel DDPG method with prioritized experience replay , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[8]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Vijay R. Konda,et al.  OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[11]  Tao You,et al.  An Adaptive Clustering-Based Algorithm for Automatic Path Planning of Heterogeneous UAVs , 2022, IEEE Trans. Intell. Transp. Syst..

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.