Twin Delayed Multi-Agent Deep Deterministic Policy Gradient

Recently, reinforcement learning has made remarkable achievements in the fields of natural science, engineering, medicine and operational research. Reinforcement learning addresses sequence problems and considers long-term returns. This long-term view of reinforcement learning is critical to find the optimal solution of many problems. The existing multi- agent reinforcement learning algorithms have the problem of overestimation in estimating the Q value. Unfortunately, there have not been many studies on overestimation of agent reinforcement learning, which will affect the learning efficiency of reinforcement learning. Based on the traditional multi-agent reinforcement learning algorithm, this paper improves the actor network and critic network, optimizes the overestimation of Q value and adopts the update delayed method to make the actor training more stable. In order to test the effectiveness of the algorithm structure, the modified method is compared with the traditional MADDPG, DDPG and DQN methods in the simulation environment.