Deep reinforcement learning for quadrotor path following with adaptive velocity

This paper proposes a solution for the path following problem of a quadrotor vehicle based on deep reinforcement learning theory. Three different approaches implementing the Deep Deterministic Policy Gradient algorithm are presented. Each approach emerges as an improved version of the preceding one. The first approach uses only instantaneous information of the path for solving the problem. The second approach includes a structure that allows the agent to anticipate to the curves. The third agent is capable to compute the optimal velocity according to the path’s shape. A training framework that combines the tensorflow-python environment with Gazebo-ROS using the RotorS simulator is built. The three agents are tested in RotorS and experimentally with the Asctec Hummingbird quadrotor. Experimental results prove the validity of the agents, which are able to achieve a generalized solution for the path following problem.

[1]  Satish Chandra,et al.  Performance Comparison of Deep and Shallow Network for Quadcopter Automation , 2018, 2018 IEEE 13th International Conference on Industrial and Information Systems (ICIIS).

[2]  Ramon Pérez,et al.  A Deep Reinforcement Learning Approach for Path Following on a Quadrotor , 2020, 2020 European Control Conference (ECC).

[3]  João Pedro Hespanha,et al.  Performance limitations in reference tracking and path following for nonlinear systems , 2008, Autom..

[4]  Azer Bestavros,et al.  Reinforcement Learning for UAV Attitude Control , 2018, ACM Trans. Cyber Phys. Syst..

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Ramon Pérez,et al.  A Survey of Path Following Control Strategies for UAVs Focused on Quadrotors , 2019, J. Intell. Robotic Syst..

[7]  João P. Hespanha,et al.  Trajectory-Tracking and Path-Following of Underactuated Autonomous Vehicles With Parametric Modeling Uncertainty , 2007, IEEE Transactions on Automatic Control.

[8]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[9]  Zhenyu Shi,et al.  Deep reinforcement learning based optimal trajectory tracking control of autonomous underwater vehicle , 2017, 2017 36th Chinese Control Conference (CCC).

[10]  Roland Siegwart,et al.  RotorS—A Modular Gazebo MAV Simulator Framework , 2016 .

[11]  P. B. Sujit,et al.  Unmanned Aerial Vehicle Path Following: A Survey and Analysis of Algorithms for Fixed-Wing Unmanned Aerial Vehicless , 2014, IEEE Control Systems.

[12]  Sergey Levine,et al.  Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[13]  Hriday Bavle,et al.  A Deep Reinforcement Learning Strategy for UAV Autonomous Landing on a Moving Platform , 2018, Journal of Intelligent & Robotic Systems.

[14]  TaeChoong Chung,et al.  Controlling bicycle using deep deterministic policy gradient algorithm , 2017, 2017 14th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI).

[15]  Aníbal Ollero,et al.  Stability of autonomous vehicle path tracking with pure delays in the control loop , 2007, Adv. Robotics.

[16]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[17]  Svetlana Lazebnik,et al.  Active Object Localization with Deep Reinforcement Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Angelo Cangelosi,et al.  Toward End-to-End Control for UAV Autonomous Landing via Deep Reinforcement Learning , 2018, 2018 International Conference on Unmanned Aircraft Systems (ICUAS).

[20]  Li Li,et al.  Traffic signal timing via deep reinforcement learning , 2016, IEEE/CAA Journal of Automatica Sinica.

[21]  I. Kaminer,et al.  Path Generation, Path Following and Coordinated Control for TimeCritical Missions of Multiple UAVs , 2006, 2006 American Control Conference.

[22]  Rita Cunha,et al.  Rotorcraft path following control for extended flight envelope coverage , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[23]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[24]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[25]  Sergey Levine,et al.  Low-Level Control of a Quadrotor With Deep Model-Based Reinforcement Learning , 2019, IEEE Robotics and Automation Letters.