Deterministic Policy Gradient With Integral Compensator for Robust Quadrotor Control

In this paper, a deep reinforcement learning-based robust control strategy for quadrotor helicopters is proposed. The quadrotor is controlled by a learned neural network which directly maps the system states to control commands in an end-to-end style. The learning algorithm is developed based on the deterministic policy gradient algorithm. By introducing an integral compensator to the actor-critic structure, the tracking accuracy and robustness have been greatly enhanced. Moreover, a two-phase learning protocol which includes both offline and online learning phase is proposed for practical implementation. An offline policy is first learned based on a simplified quadrotor model. Then, the policy is online optimized in actual flight. The proposed approach is evaluated in the flight simulator. The results demonstrate that the offline learned policy is highly robust to model errors and external disturbances. It also shows that the online learning could significantly improve the control performance.

[1]  Anthony Tzes,et al.  Model predictive quadrotor control: attitude, altitude and position experimental studies , 2012 .

[2]  P. Young,et al.  An approach to the linear multivariable servomechanism problem. , 1972 .

[3]  Antonio Barrientos,et al.  Aerial coverage optimization in precision agriculture management: A musical harmony inspired approach , 2013 .

[4]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[5]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[6]  Hriday Bavle,et al.  A Deep Reinforcement Learning Strategy for UAV Autonomous Landing on a Moving Platform , 2018, Journal of Intelligent & Robotic Systems.

[7]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[8]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[9]  Sergey Levine,et al.  Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[11]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[12]  Erwin Susanto,et al.  Quadrotor model with proportional derivative controller , 2017, 2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC).

[13]  Ümit Özgüner,et al.  Sliding Mode Control of a Quadrotor Helicopter , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[14]  Quan Quan,et al.  A Practical Performance Evaluation Method for Electric Multicopters , 2017, IEEE/ASME Transactions on Mechatronics.

[15]  Bin Xu,et al.  Composite Learning Finite-Time Control With Application to Quadrotors , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[16]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[17]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Jizhong Xiao,et al.  An autonomous flyer photographer , 2016, 2016 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER).

[19]  James F. Whidborne,et al.  A prototype of an autonomous controller for a quadrotor UAV , 2007, 2007 European Control Conference (ECC).

[20]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[21]  Roland Siegwart,et al.  PID vs LQ control techniques applied to an indoor micro quadrotor , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[22]  H. Jin Kim,et al.  Feedback linearization vs. adaptive sliding mode control for a quadrotor helicopter , 2009 .

[23]  Octavio Garcia,et al.  Robust Backstepping Control Based on Integral Sliding Modes for Tracking of Quadrotors , 2014, J. Intell. Robotic Syst..

[24]  Roland Siegwart,et al.  Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Darius Burschka,et al.  Toward a Fully Autonomous UAV: Research Platform for Indoor and Outdoor Urban Search and Rescue , 2012, IEEE Robotics & Automation Magazine.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Anuradha M. Annaswamy,et al.  Adaptive Control of Quadrotor UAVs: A Design Trade Study With Flight Evaluations , 2013, IEEE Transactions on Control Systems Technology.

[29]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[30]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[31]  Martha White,et al.  Linear Off-Policy Actor-Critic , 2012, ICML.

[32]  Taeyoung Lee,et al.  Robust Adaptive Attitude Tracking on ${\rm SO}(3)$ With an Application to a Quadrotor UAV , 2013, IEEE Transactions on Control Systems Technology.

[33]  Changyin Sun,et al.  Learning to Navigate Through Complex Dynamic Environment With Modular Deep Reinforcement Learning , 2018, IEEE Transactions on Games.

[34]  Lorenzo Marconi,et al.  Modeling and control of a flying robot for contact inspection , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[36]  Steven Lake Waslander,et al.  Multi-agent quadrotor testbed control design: integral sliding mode vs. reinforcement learning , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37]  Ian R. Petersen,et al.  Robust Hybrid Nonlinear Control Systems for the Dynamics of a Quadcopter Drone , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[38]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[39]  Shalabh Bhatnagar,et al.  Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[40]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[41]  Frank L. Lewis,et al.  Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis , 2017, IEEE Transactions on Cybernetics.

[42]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[43]  Ziyang Meng,et al.  Immersion and Invariance-Based Adaptive Controller for Quadrotor Systems , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[44]  Cheng Wu,et al.  Depth Control of Model-Free AUVs via Reinforcement Learning , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[45]  Bin Jiang,et al.  A novel robust attitude control for quadrotor aircraft subject to actuator faults and wind gusts , 2018, IEEE/CAA Journal of Automatica Sinica.

[46]  Yao Zhang,et al.  Nonlinear Robust Adaptive Tracking Control of a Quadrotor UAV Via Immersion and Invariance Methodology , 2015, IEEE Transactions on Industrial Electronics.