Deep Reinforcement Learning for Target Searching in Cognitive Electronic Warfare

The recent appreciation of deep reinforcement learning (DRL) arises from its successes in many domains, but the applications of DRL in practical engineering are still unsatisfactory, including optimizing control strategies in cognitive electronic warfare (CEW). CEW is a massive and challenging project, and due to the sensitivity of the data sources, there are few open studies that have investigated CEW. Moreover, the spatial sparsity, continuous action, and partially observable environment that exist in CEW have greatly limited the abilities of DRL algorithms, which strongly depend on state-value and action-value functions. In this paper, we use Python to build a 3-D space game named Explorer to simulate various CEW environments in which the electronic attacker is an unmanned combat air vehicle (UCAV) and the defender is an observation station, both of which are equipped with radar as the observation sensor. In our game, the UCAV needs to accomplish the task of detecting the target as early as possible to perform follow-up tracking and guidance tasks. To allow an ”infant” UCAV to understand what ”target searching” is, we train the UCAV’s maneuvering strategies by means of a well-designed reward shaping, a simplified constant accelerated motion control, and a deep deterministic policy gradient (DDPG) algorithm based on a generative model and variational Bayesian estimation. The experimental results show that when the operating cycle is $0.2~s$ , the search success rate of the trained UCAV in 10 000 episodes is improved by 33.36% compared with the benchmark, and the target destruction rate is similarly improved by 57.84%.

[1]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[2]  John Kenneth Salisbury,et al.  Learning to represent haptic feedback for partially-observable tasks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[4]  Sanguk Noh,et al.  Intelligent command and control agent in electronic warfare settings , 2010 .

[5]  Zhuang Yi,et al.  IIGA based algorithm for cooperative jamming resource allocation , 2009, 2009 Asia Pacific Conference on Postgraduate Research in Microelectronics & Electronics (PrimeAsia).

[6]  W. Marsden I and J , 2012 .

[7]  Luis Felipe Gonzalez,et al.  Enabling UAV Navigation with Sensor and Environmental Uncertainty in Cluttered and GPS-Denied Environments , 2016, Sensors.

[8]  Shuzhi Sam Ge,et al.  Dynamic Motion Planning for Mobile Robots Using Potential Field Method , 2002, Auton. Robots.

[9]  Ming Diao,et al.  Real-Time Path Planning Based on the Situation Space of UCAVs in a Dynamic Environment , 2018, Microgravity Science and Technology.

[10]  Hugh F. Durrant-Whyte,et al.  Dynamic space reconfiguration for Bayesian search and tracking with moving targets , 2008, Auton. Robots.

[11]  Delin Luo,et al.  Path planning for a reconnaissance UAV in uncertain environment , 2016, 2016 12th IEEE International Conference on Control and Automation (ICCA).

[12]  Halit Ergezer,et al.  3D path planning for UAVs for maximum information collection , 2013, 2013 International Conference on Unmanned Aircraft Systems (ICUAS).

[13]  Razvan Pascanu,et al.  Vector-based navigation using grid-like representations in artificial agents , 2018, Nature.

[14]  O. Hallingstad,et al.  Bayesian Terrain-Based Underwater Navigation Using an Improved State-Space Model , 2007, 2007 Symposium on Underwater Technology and Workshop on Scientific Use of Submarine Cables and Related Technologies.

[15]  Stephen Tyree,et al.  Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[16]  Olivier Buffet,et al.  The factored policy-gradient planner , 2009, Artif. Intell..

[17]  Xuyan Tu,et al.  Research on autonomous control system structure of UCAV based on cognitive science , 2009, 2009 IEEE International Conference on Network Infrastructure and Digital Content.

[18]  Paolo Braca,et al.  Cognitive multistatic AUV networks , 2014, 17th International Conference on Information Fusion (FUSION).

[19]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[20]  Pan Wei,et al.  Research on force assignment for ground-to-air radar jamming system based on chaos genetic algorithms , 2015, The 27th Chinese Control and Decision Conference (2015 CCDC).

[21]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[22]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[23]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[24]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[25]  G Nagendra Rao,et al.  Trends in Electronic Warfare , 2003 .

[26]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[27]  Fuh-Gwo Yuan,et al.  A 3D collision avoidance strategy for UAV with physical constraints , 2016 .

[28]  Ali Hamzeh,et al.  A PSO-based multi-robot cooperation method for target searching in unknown environments , 2016, Neurocomputing.

[29]  Ting Jiang,et al.  Using Fuzzy Cognitive Maps to Analyze the Information Processing Model of Situation Awareness , 2014, 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics.

[30]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[31]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[32]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[33]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[34]  Mark A. Neerincx,et al.  Adaptive Automation Based on an Object-Oriented Task Model: Implementation and Evaluation in a Realistic C2 Environment , 2010 .

[35]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[36]  W. P. du Plessis,et al.  Threat evaluation and jamming allocation , 2017 .

[37]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[38]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[39]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[40]  Joel W. Burdick,et al.  Analysis of Search Decision Making Using Probabilistic Search Strategies , 2012, IEEE Transactions on Robotics.

[41]  Charles F. Gaumond,et al.  Deep reinforcement learning for cognitive sonar , 2018 .