Learning to Guide: Guidance Law Based on Deep Meta-Learning and Model Predictive Path Integral Control

In this paper, we present a novel guidance scheme based on model-based deep reinforcement learning (RL) technique. With model-based deep RL method, a deep neural network is trained as a predictive model of guidance dynamics which is incorporated into a model predictive path integral (MPPI) control framework. However, the traditional MPPI framework assumes the actual environment similar to the training dataset for the deep neural network which is impractical in practice with different maneuvering of the target, other perturbations, and actuator failures. To address this problem, our method utilizes meta-learning technique to make the deep neural dynamics model adapt to such changes online. With this approach, we can alleviate the performance deterioration of standard MPPI control caused by the difference between the actual environment and training data. Then, a novel guidance law for a varying velocity interceptor intercepting maneuvering target with desired terminal impact angle under actuator failure is constructed based on the aforementioned techniques. The simulation and experiment results under different cases show the effectiveness and robustness of the proposed guidance law in achieving successful interceptions of maneuvering target.

[1]  Ming Xin,et al.  Sliding-Mode Impact Time Guidance Law Design for Various Target Motions , 2019, Journal of Guidance, Control, and Dynamics.

[2]  D. Gann,et al.  Learning to Adapt: Organisational Adaptation to Climate Change Impacts , 2006 .

[3]  Ali Farhadi,et al.  Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Ji Haibo,et al.  Guidance Laws With Input Constraints and Actuator Failures , 2016 .

[6]  Di Zhou,et al.  Adaptive Dynamic Surface Guidance Law with Input Saturation Constraint and Autopilot Dynamics , 2016 .

[7]  Nolan Wagener,et al.  Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Jinzhi Wang,et al.  Partial Integrated Guidance and Control for Missiles with Three-Dimensional Impact Angle Constraints , 2014 .

[9]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[10]  Zhenghua Liu,et al.  Three-Dimensional Impact Angle Constrained Partial Integrated Guidance and Control With Finite-Time Convergence , 2018, IEEE Access.

[11]  Evangelos Theodorou,et al.  Relative entropy and free energy dualities: Connections to Path Integral and KL control , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[12]  Guillaume Ducard,et al.  Fault-tolerant Flight Control and Guidance Systems: Practical Methods for Small Unmanned Aerial Vehicles , 2009 .

[13]  Jianyong Yao,et al.  Adaptive RISE Control of Hydraulic Systems With Multilayer Neural-Networks , 2019, IEEE Transactions on Industrial Electronics.

[14]  Michael Mistry,et al.  Uncertainty averse pushing with model predictive path integral control , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[15]  Jun Fu,et al.  Missile Guidance Law Based on Robust Model Predictive Control Using Neural-Network Optimization , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Ross A. Knepper,et al.  DeepMPC : Learning Latent Nonlinear Dynamics for Real-Time Predictive Control , 2015 .

[17]  Steven Lake Waslander,et al.  Deep Learning a Quadrotor Dynamic Model for Multi-Step Prediction , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  A new data characterization for selecting clustering algorithms using meta-learning , 2019, Inf. Sci..

[19]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[20]  Hilbert J. Kappen,et al.  Adaptive Importance Sampling for Control and Inference , 2015, ArXiv.

[21]  Sergey Levine,et al.  Learning Image-Conditioned Dynamics Models for Control of Underactuated Legged Millirobots , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Luca Rigazio,et al.  Path Integral Networks: End-to-End Differentiable Optimal Control , 2017, ArXiv.

[23]  Li Li,et al.  Adaptive recommendation model using meta-learning for population-based algorithms , 2019, Inf. Sci..

[24]  Ran Yi,et al.  Fast terminal sliding mode control based on extended state observer for swing nozzle of anti-aircraft missile , 2015 .

[25]  Jiang Wang,et al.  Optimal integral sliding mode guidance law based on generalized model predictive control , 2016, J. Syst. Control. Eng..

[26]  Marc Peter Deisenroth,et al.  Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.

[27]  Shashi Ranjan Kumar,et al.  Three-dimensional impact angle guidance with coupled engagement dynamics , 2017 .

[28]  Warren E. Dixon,et al.  Asymptotic Tracking for Aircraft via Robust and Adaptive Dynamic Inversion Methods , 2010, IEEE Transactions on Control Systems Technology.

[29]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[30]  Michael C. Yip,et al.  Model-Less Feedback Control of Continuum Manipulators in Constrained Environments , 2014, IEEE Transactions on Robotics.

[31]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[32]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Richard Linares,et al.  Adaptive Guidance with Reinforcement Meta-Learning , 2019, ArXiv.

[34]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[35]  Evangelos A. Theodorou,et al.  Model Predictive Path Integral Control: From Theory to Parallel Computation , 2017 .

[36]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[37]  Yong-Min Kim,et al.  The realization of the three dimensional guidance law using modified augmented proportional navigation , 1996, Proceedings of 35th IEEE Conference on Decision and Control.

[38]  Athanasios S. Polydoros,et al.  Survey of Model-Based Reinforcement Learning: Applications on Robotics , 2017, J. Intell. Robotic Syst..

[39]  Richard J. Mammone,et al.  Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[40]  Roland Siegwart,et al.  Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[41]  C. Karen Liu,et al.  Deep Haptic Model Predictive Control for Robot-Assisted Dressing , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Yuanli Cai,et al.  Three-Dimensional Impact Angle Constrained Guidance Laws with Fixed-Time Convergence , 2017 .

[43]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[46]  Lei He,et al.  Adaptive Terminal Guidance Law for Spiral-Diving Maneuver Based on Virtual Sliding Targets , 2018, Journal of Guidance, Control, and Dynamics.