An approximate Dynamic Programming based controller for an underactuated 6DoF quadrotor

This paper discusses how the principles of Adaptive Dynamic Programming (ADP) can be applied to the control of a quadrotor helicopter platform flying in an uncontrolled environment and subjected to various disturbances and model uncertainties. ADP is based on reinforcement learning using an actor-critic structure. Due to the complexity of the quadrotor system, the learning process has to use as much information as possible about the system and the environment. Various methods to improve the learning speed and efficiency are presented. Neural networks with local activation functions are used as function approximators because the state-space can not be explored efficiently due to its size and the limited time available. The complex dynamics is controlled by a single critic and by multiple actors thus avoiding the curse of dimensionality. After a number of iterations, the overall actor-critic structure stores information (knowledge) about the system dynamics and the optimal controller that can accomplish the explicit or implicit goal specified in the cost function.

[1]  C.J.B. Macnab,et al.  Robust neural network control of a quadrotor helicopter , 2008, 2008 Canadian Conference on Electrical and Computer Engineering.

[2]  Simon Newman Introduction to helicopter and tiltrotor flight simulation , 2007 .

[3]  Yan Li,et al.  Analysis of minimal radial basis function network algorithm for real-time identification of nonlinear dynamic systems , 2000 .

[4]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[5]  Rogelio Lozano,et al.  Real-time stabilization and tracking of a four rotor mini-rotorcraft , 2003 .

[6]  G. Saridis,et al.  Approximate Solutions to the Time-Invariant Hamilton–Jacobi–Bellman Equation , 1998 .

[7]  Jonathan P. How,et al.  Indoor Multi-Vehicle Flight Testbed for Fault Detection, Isolation, and Recovery , 2006 .

[8]  Victor M. Becerra,et al.  Optimal control , 2008, Scholarpedia.

[9]  Ruey-Wen Liu,et al.  Construction of Suboptimal Control Sequences , 1967 .

[10]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[11]  Frank L. Lewis,et al.  Adaptive dynamic programming applied to a 6DoF quadrotor , 2011 .

[12]  S. Marsili-Libelli Optimal design of PID regulators , 1981 .

[13]  Kwang Y. Lee,et al.  An optimal tracking neuro-controller for nonlinear dynamic systems , 1996, IEEE Trans. Neural Networks.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Sarangapani Jagannathan,et al.  Output Feedback Control of a Quadrotor UAV Using Neural Networks , 2010, IEEE Transactions on Neural Networks.

[16]  Alex M. Andrew,et al.  ROBOT LEARNING, edited by Jonathan H. Connell and Sridhar Mahadevan, Kluwer, Boston, 1993/1997, xii+240 pp., ISBN 0-7923-9365-1 (Hardback, 218.00 Guilders, $120.00, £89.95). , 1999, Robotica (Cambridge. Print).

[17]  Roland Siegwart,et al.  Full control of a quadrotor , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Axel van Lamsweerde,et al.  Learning machine learning , 1991 .

[19]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  V. Moreau,et al.  Dynamic modeling and intuitive control strategy for an "X4-flyer" , 2005, 2005 International Conference on Control and Automation.

[21]  Claire J. Tomlin,et al.  Quadrotor Helicopter Flight Dynamics and Control: Theory and Experiment , 2007 .

[22]  S. Park,et al.  RIC (robust internal-loop compensator) based flight control of a quad-rotor type UAV , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.