Deterministic Policy Gradient with Advantage Function for Fixed Wing UAV Automatic Landing

This paper addresses the autolanding problem for fixed wing unmanned aerial vehicles (UAV) in the presence of the downburst. The proposed method is developed based on the reinforcement learning methodology. The solution consists of a path-tracking controller for the glideslope maneuver and an attitude controller for the flare maneuver. Both controllers are designed in continuous state and action spaces. In our study, two complementary techniques are proposed within the framework of deterministic policy gradient (DPG). First, the advantage function is introduced in the critic network to improve the performance of the learning process. The proposed representation of the action value function consists of two parts: the low-frequency part and the high-frequency one. Second, a two-stream network is developed to tackle with the issue of partially observable Markov decision process (POMDP). The architecture synthesizes past experiences to perform policy evaluations and policy improvements, which has successfully improved the robustness of the learned policy. The described performance of our approach is illustrated in flight simulations under the influence of the wind field.

[1]  Jinyoung Suk,et al.  Model Predictive Control for UAV Automatic Landing on Moving Carrier Deck with Heave Motion , 2015 .

[2]  Gemma Hornero,et al.  Design of a low-cost Wireless Sensor Network with UAV mobile node for agricultural applications , 2015, Comput. Electron. Agric..

[3]  Om Prakash Verma,et al.  Dynamic motion planning for aerial surveillance on a fixed-wing UAV , 2017, 2017 International Conference on Unmanned Aircraft Systems (ICUAS).

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Rita Cunha,et al.  Autolanding Controller for a Fixed Wing Unmanned Air Vehicle , 2007 .

[6]  Changyin Sun,et al.  Learning to Navigate Through Complex Dynamic Environment With Modular Deep Reinforcement Learning , 2018, IEEE Transactions on Games.

[7]  Ming Zhu,et al.  Adaptive Sliding Mode Relative Motion Control for Autonomous Carrier Landing of Fixed-Wing Unmanned Aerial Vehicles , 2017, IEEE Access.

[8]  P. B. Sujit,et al.  Safe Landing of Fixed Wing UAVs , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W).

[9]  Hui Tang,et al.  Nonlinear control of fixed-wing UAVs in presence of stochastic winds , 2016, Commun. Nonlinear Sci. Numer. Simul..

[10]  P. B. Sujit,et al.  A survey of autonomous landing techniques for UAVs , 2014, 2014 International Conference on Unmanned Aircraft Systems (ICUAS).

[11]  D. Erdos,et al.  An experimental UAV system for search and rescue challenge , 2013, IEEE Aerospace and Electronic Systems Magazine.

[12]  Robert H. Klenke,et al.  A Fully Parameterizable Implementation of Autonomous Take-off and Landing for a Fixed Wing UAV , 2015 .

[13]  Michael Ivan,et al.  A Ring-Vortex Downburst Model for Flight Simulations , 1986 .

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Vanni Nardino,et al.  Estimation of canopy attributes in beech forests using true colour digital images from a small fixed-wing UAV , 2016, Int. J. Appl. Earth Obs. Geoinformation.

[16]  Thor I. Fossen,et al.  Non-linear model predictive control for guidance of a fixed-wing UAV in precision deep stall landing , 2015, 2015 International Conference on Unmanned Aircraft Systems (ICUAS).

[17]  M. Sterling,et al.  A simple vortex model of a thunderstorm downburst – A parametric evaluation , 2018 .

[18]  Radhakant Padhi,et al.  Automatic path planning and control design for autonomous landing of UAVs using dynamic inversion , 2009, 2009 American Control Conference.