Online Multi-Objective Model-Independent Adaptive Tracking Mechanism for Dynamical Systems

The optimal tracking problem is addressed in the robotics literature by using a variety of robust and adaptive control approaches. However, these schemes are associated with implementation limitations such as applicability in uncertain dynamical environments with complete or partial model-based control structures, complexity and integrity in discrete-time environments, and scalability in complex coupled dynamical systems. An online adaptive learning mechanism is developed to tackle the above limitations and provide a generalized solution platform for a class of tracking control problems. This scheme minimizes the tracking errors and optimizes the overall dynamical behavior using simultaneous linear feedback control strategies. Reinforcement learning approaches based on value iteration processes are adopted to solve the underlying Bellman optimality equations. The resulting control strategies are updated in real time in an interactive manner without requiring any information about the dynamics of the underlying systems. Means of adaptive critics are employed to approximate the optimal solving value functions and the associated control strategies in real time. The proposed adaptive tracking mechanism is illustrated in simulation to control a flexible wing aircraft under uncertain aerodynamic learning environment.

[1]  Bor-Sen Chen,et al.  Fuzzy tracking control design for nonlinear dynamic systems via T-S fuzzy model , 2001, IEEE Trans. Fuzzy Syst..

[2]  Frank L. Lewis,et al.  Approximate dynamic programming solutions of multi-agent graphical games using actor-critic network structures , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[3]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[4]  Frank L. Lewis,et al.  Discrete-time dynamic graphical games: model-free reinforcement learning solution , 2015 .

[5]  Magdi S. Mahmoud,et al.  Policy iteration and coupled Riccati solutions for dynamic graphical games , 2017 .

[6]  Bidyadhar Subudhi,et al.  Real-Time Adaptive Control of a Flexible Manipulator Using Reinforcement Learning , 2012, IEEE Transactions on Automation Science and Engineering.

[7]  Sergey Levine,et al.  Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Liang Liu,et al.  Adaptive tracking control for a class of uncertain switched nonlinear systems , 2015, Autom..

[9]  Jian Chen,et al.  A continuous asymptotic tracking control strategy for uncertain nonlinear systems , 2004, IEEE Transactions on Automatic Control.

[10]  Shaocheng Tong,et al.  Adaptive Fuzzy Tracking Control Design for SISO Uncertain Nonstrict Feedback Nonlinear Systems , 2016, IEEE Transactions on Fuzzy Systems.

[11]  Frank L. Lewis,et al.  Optimal and Autonomous Control Using Reinforcement Learning: A Survey , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[13]  Paul J. Webros A menu of designs for reinforcement learning over time , 1990 .

[14]  Thanh Thi Nguyen,et al.  A Multi-Objective Deep Reinforcement Learning Framework , 2018, Eng. Appl. Artif. Intell..

[15]  Hans Knutsson,et al.  Greedy adaptive critics for LPQ [dvs LQR] problems : Convergence Proofs , 1996 .

[16]  Victor R. Lesser,et al.  Coordinating multi-agent reinforcement learning with limited communication , 2013, AAMAS.

[17]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[18]  Frank L. Lewis,et al.  Approximate and Reinforcement Learning techniques to solve non-convex Economic Dispatch problems , 2014, 2014 IEEE 11th International Multi-Conference on Systems, Signals & Devices (SSD14).

[19]  M. V. Cook,et al.  Modelling the flight dynamics of the hang glider , 2005 .

[20]  Luigi Fortuna,et al.  Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control , 2009 .

[21]  Saeid Nahavandi,et al.  Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications , 2018, IEEE Transactions on Cybernetics.

[22]  Shaocheng Tong,et al.  Reinforcement Learning Design-Based Adaptive Tracking Control With Less Learning Parameters for Nonlinear Discrete-Time MIMO Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[24]  Michael V. Cook,et al.  Flight Dynamics Principles: A Linear Systems Approach to Aircraft Stability and Control , 2007 .

[25]  Frank L. Lewis,et al.  Model-Free Gradient-Based Adaptive Learning Controller for an Unmanned Flexible Wing Aircraft , 2018, Robotics.

[26]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[27]  L. Buşoniu,et al.  A comprehensive survey of multi-agent reinforcement learning , 2011 .

[28]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[29]  Mohammed Abouheaf,et al.  Multi-agent reinforcement learning approach based on reduced value function approximations , 2017, 2017 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS).

[30]  E. A. Kilkenny Full Scale Wind Tunnel Tests on Hang Glider Pilots , 1984 .

[31]  B. Paden,et al.  A different look at output tracking: control of a VTOL aircraft , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[32]  Guido De Matteis HANG-GLIDER RESPONSE TO ATMOSPHERIC INPUTS , 1992 .

[33]  Yoshimasa Ochi,et al.  Modeling of the Longitudinal Dynamics of a Hang Glider , 2015 .

[34]  Warren E. Dixon,et al.  Model-based reinforcement learning for infinite-horizon approximate optimal tracking , 2014, 53rd IEEE Conference on Decision and Control.

[35]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[36]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[37]  Frank L. Lewis,et al.  Differential graphical games: Policy iteration solutions and coupled Riccati formulation , 2014, 2014 European Control Conference (ECC).

[38]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[39]  Frank L. Lewis,et al.  Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[40]  Yang Li,et al.  Adaptive Neural Network Control of AUVs With Control Input Nonlinearities Using Reinforcement Learning , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[41]  Henk Nijmeijer,et al.  Tracking Control of Mobile Robots: A Case Study in Backstepping , 1997, Autom..

[42]  Azer Bestavros,et al.  Reinforcement Learning for UAV Attitude Control , 2018, ACM Trans. Cyber Phys. Syst..

[43]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[44]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[45]  Frank L. Lewis,et al.  Multi-agent discrete-time graphical games and reinforcement learning solutions , 2014, Autom..

[46]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[47]  J. J. Slotine,et al.  Tracking control of non-linear systems using sliding surfaces with application to robot manipulators , 1983, 1983 American Control Conference.

[48]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[49]  Frank L. Lewis,et al.  Dynamic graphical games: Online adaptive learning solutions using approximate dynamic programming , 2014 .

[50]  Guido De Matteis,et al.  Response of hang gliders to control , 1990 .

[51]  Kristin Ytterstad Pettersen,et al.  Tracking control of an underactuated ship , 2003, IEEE Trans. Control. Syst. Technol..

[52]  Mohammed Abouheaf,et al.  Multi-Agent Synchronization Using Online Model-Free Action Dependent Dual Heuristic Dynamic Programming Approach , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[53]  Yoshimasa Ochi Modeling of Flight Dynamics and Pilot's Handling of a Hang Glider , 2017 .

[54]  Carlos Torre-Ferrero,et al.  Time-Varying Formation Controllers for Unmanned Aerial Vehicles Using Deep Reinforcement Learning , 2017, ArXiv.

[55]  E. A. Kilkenny An experimental study of the longitudinal aerodynamic and static stability characteristics of hang gliders , 1986 .

[56]  Frank L. Lewis,et al.  Action Dependent Dual Heuristic Programming Solution for the Dynamic Graphical Games , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[57]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[58]  E. A. Kilkenny An Evaluation of a Mobile Aerodynamic Test Facility for Hang Glider Wings , 1983 .

[59]  Guido De Matteis,et al.  Dynamics of hang-gliders , 1991 .

[60]  Peter Vrancx,et al.  Decentralized Learning in Markov Games , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[61]  Frank L. Lewis,et al.  Adaptive cooperative tracking control of higher-order nonlinear systems with unknown dynamics , 2012, Autom..

[62]  Frank L. Lewis,et al.  Optimized Assistive Human–Robot Interaction Using Reinforcement Learning , 2016, IEEE Transactions on Cybernetics.