论文信息 - Strategy Generation Based on Reinforcement Learning with Deep Deterministic Policy Gradient for UCAV

Strategy Generation Based on Reinforcement Learning with Deep Deterministic Policy Gradient for UCAV

Unmanned Combat Aerial Vehicles (UCAVs) are essential participants in the future air-combat. Due to high dynamics and randomness of air-combat process, traditional methods are difficult to obtain the optimal maneuvering strategy. The reinforcement learning (RL) is used to solve this problem. Deep deterministic policy gradient (DDPG) is used in reinforcement learning to deal with high-dimensional and continuous action space in this paper. And a method using a temporary replay buffer is proposed to improve the efficiency of neural network training. A 3-D air-combat environment is built to verify the algorithm proposed in this paper. Result shows that the agent with strategy obtained by the RL with DDPG is able to get high advantage during the confrontation, and the training efficiency of neural network is highly improved by using temporary replay buffer.

[1] Gong Guang-Gong,et al. Cognition behavior model for air combat based on reinforcement learning , 2010 .

[2] Zhipeng Li,et al. Asynchronous Methods for Multi-agent Deep Deterministic Policy Gradient , 2018, ICONIP.

[3] Li Pinga. A 3-D Route Planning Algorithm for Unmanned Aerial Vehicle Based on Q-Learning , 2012 .

[4] Ehsan Taheri,et al. Aircraft Optimal Terrain/Threat-Based Trajectory Planning and Control , 2014 .

[5] Anil V. Rao,et al. Optimal Trajectory and Control Generation for Landing of Multiple Aircraft in the Presence of Obstacles , 2012 .

[6] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[7] Sergio Ruiz,et al. A Novel Performance Framework and Methodology to Analyze the Impact of 4D Trajectory Based Operations in the Future Air Traffic Management System , 2018 .

[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.