Cooperative online Guide-Launch-Guide policy in a target-missile-defender engagement using deep reinforcement learning

Abstract A target-missile-defender engagement is considered, in which the missile attempts to intercept the target and the defender tries to prevent this interception via missile's interception. In this engagement, finding an optimal launch time of the defender and an optimal target guidance law before and after launch, which can be formulated as a switched system optimization problem, is crucial for improving performance of the target-defender team. The objective of this paper is to examine the potential of using deep reinforcement learning in switched system optimization. To that end, we propose estimating the optimal launch time of the defender and the optimal guidance law of the target online, using a reinforcement learning based method. A policy suggesting at each decision time the bang-bang target maneuver and whether or not to launch the defender was obtained and analyzed via simulations. Simulations showed the ability of the reinforcement learning based method to obtain a close to optimal level of performance in terms of the suggested cost function.

[1]  Tal Shima,et al.  Weapon–Target-Allocation Strategies in Multiagent Target–Missile–Defender Engagement , 2017 .

[2]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[3]  Steven Liu,et al.  Optimal Control and Scheduling of Switched Systems , 2011, IEEE Transactions on Automatic Control.

[4]  R. Asher,et al.  Optimal Guidance with Maneuvering Targets , 1974 .

[5]  Vitaly Shalumov,et al.  Optimal Cooperative Guidance Laws in a Multiagent Target–Missile–Defender Engagement , 2019, Journal of Guidance, Control, and Dynamics.

[6]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Y. Wardi,et al.  Optimal control of switching times in switched dynamical systems , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[9]  V. Rehbock,et al.  A discrete filled function method for the optimal control of switched systems in discrete time , 2009 .

[10]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[11]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[12]  Magnus Egerstedt,et al.  On-Line Optimization of Switched-Mode Dynamical Systems , 2009, IEEE Transactions on Automatic Control.

[13]  Magnus Egerstedt,et al.  Transition-time optimization for switched-mode dynamical systems , 2006, IEEE Transactions on Automatic Control.

[14]  Tal Shima,et al.  Linear Quadratic Optimal Cooperative Strategies for Active Aircraft Protection , 2012 .

[15]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[16]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Jianghai Hu,et al.  Optimal quadratic regulation for discrete-time switched linear systems: A numerical approach , 2008, 2008 American Control Conference.

[18]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[19]  Raymond A. DeCarlo,et al.  Optimal control of switching systems , 2005, Autom..

[20]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[21]  Sergey Levine,et al.  Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.

[22]  Tal Shima,et al.  Target Evasion from a Missile Performing Multiple Switches in Guidance Law , 2016 .

[23]  Vitaly Shalumov Online Launch-Time Selection Using Deep Learning in a Target–Missile–Defender Engagement , 2019 .

[24]  Shaocheng Tong,et al.  Observer-Based Adaptive Fuzzy Fault-Tolerant Optimal Control for SISO Nonlinear Systems , 2019, IEEE Transactions on Cybernetics.

[25]  Ashwini Ratnoo,et al.  Guidance Strategies Against Defended Aerial Targets , 2012 .

[26]  Xuping Xu,et al.  A dynamic programming approach for optimal control of switched systems , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[27]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[28]  Panos J. Antsaklis,et al.  On time optimal control of integrator switched systems with state constraints , 2005 .

[29]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[30]  R. Decarlo,et al.  Applications of Numerical Optimal Control to Nonlinear Hybrid Systems , 2007 .

[31]  Tal Shima,et al.  Optimal Cooperative Pursuit and Evasion Strategies Against a Homing Missile , 2011 .

[32]  Alessandro Abate,et al.  Efficient suboptimal solutions of switched LQR problems , 2009, 2009 American Control Conference.

[33]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[34]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[35]  Tal Shima,et al.  Cooperative Differential Games Strategies for Active Aircraft Protection from a Homing Missile , 2010 .

[36]  Shaocheng Tong,et al.  Adaptive Neural Networks Prescribed Performance Control Design for Switched Interconnected Uncertain Nonlinear Systems , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Tal Shima,et al.  Cooperative Nonlinear Guidance Strategies for Aircraft Defense , 2017 .

[38]  Feng Zhu,et al.  Optimal control of hybrid switched systems: A brief survey , 2015, Discret. Event Dyn. Syst..

[39]  Shaul Gutman,et al.  Guaranteed Miss Distance in Guidance Systems with Bounded Controls and Bounded Noise , 2012 .