Strategy Generation Based on Reinforcement Learning with Deep Deterministic Policy Gradient for UCAV

Unmanned Combat Aerial Vehicles (UCAVs) are essential participants in the future air-combat. Due to high dynamics and randomness of air-combat process, traditional methods are difficult to obtain the optimal maneuvering strategy. The reinforcement learning (RL) is used to solve this problem. Deep deterministic policy gradient (DDPG) is used in reinforcement learning to deal with high-dimensional and continuous action space in this paper. And a method using a temporary replay buffer is proposed to improve the efficiency of neural network training. A 3-D air-combat environment is built to verify the algorithm proposed in this paper. Result shows that the agent with strategy obtained by the RL with DDPG is able to get high advantage during the confrontation, and the training efficiency of neural network is highly improved by using temporary replay buffer.