General value iteration based reinforcement learning for solving optimal tracking control problem of continuous-time affine nonlinear systems

In this paper, a novel reinforcement learning (RL) based approach is proposed to solve the optimal tracking control problem (OTCP) for continuoustime (CT) affine nonlinear systems using general value iteration (VI). First, the tracking performance criterion is described in a total-cost manner without a discount term which can ensure the asymptotic stability of the tracking error. Then, some mild assumptions are assumed to relax the restriction of the initial admissible control in most existing references. Based on the proposed assumptions, the general VI method is proposed and three situations are considered to show the convergence with any initial positive performance function. To validate the theoretical results, the proposed general VI method is implemented by two neural networks on a nonlinear springmassdamper system and two situations are considered to show the effectiveness.

[1]  Yanhong Luo,et al.  Data-driven optimal tracking control for a class of affine non-linear continuous-time systems with completely unknown dynamics , 2016 .

[2]  Derong Liu,et al.  Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm , 2014, Neurocomputing.

[3]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[4]  Warren E. Dixon,et al.  Model-based reinforcement learning for infinite-horizon approximate optimal tracking , 2014, 53rd IEEE Conference on Decision and Control.

[5]  Shaocheng Tong,et al.  Adaptive Neural Networks Decentralized FTC Design for Nonstrict-Feedback Nonlinear Interconnected Large-Scale Systems Against Actuator Faults , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Frank L. Lewis,et al.  Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning , 2016, Autom..

[7]  Feng Liu,et al.  Online Supplementary ADP Learning Controller Design and Application to Power System Frequency Control With Large-Scale Wind Energy Integration , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Huaguang Zhang,et al.  Online optimal control of unknown discrete-time nonlinear systems by using time-based adaptive dynamic programming , 2015, Neurocomputing.

[9]  Kai Wang Near-optimal Tracking Control of a Nonholonomic Mobile Robot with Uncertainties , 2012 .

[10]  Haibo He,et al.  Event-Driven Adaptive Robust Control of Nonlinear Systems With Uncertainties Through NDP Strategy , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[11]  B. Paden,et al.  Nonlinear inversion-based output tracking , 1996, IEEE Trans. Autom. Control..

[12]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[13]  Tingwen Huang,et al.  Data-Driven $H_\infty$ Control for Nonlinear Distributed Parameter Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Derong Liu,et al.  Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming , 2016, Neurocomputing.

[15]  Warren E. Dixon,et al.  Approximate optimal trajectory tracking for continuous-time nonlinear systems , 2013, Autom..

[16]  Huaguang Zhang,et al.  Leader-Based Optimal Coordination Control for the Consensus Problem of Multiagent Differential Games via Fuzzy Adaptive Dynamic Programming , 2015, IEEE Transactions on Fuzzy Systems.

[17]  Frank L. Lewis,et al.  A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems , 2013, Autom..

[18]  Derong Liu,et al.  Event-based input-constrained nonlinear H∞ state feedback with adaptive critic and neural implementation , 2016, Neurocomputing.

[19]  Derong Liu,et al.  Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach , 2016, Soft Comput..

[20]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[21]  Huaguang Zhang,et al.  Neural-Network-Based Constrained Optimal Control Scheme for Discrete-Time Switched Nonlinear System Using Dual Heuristic Programming , 2014, IEEE Transactions on Automation Science and Engineering.

[22]  Tingwen Huang,et al.  Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[23]  Han-Xiong Li,et al.  Adaptive Optimal Control of Highly Dissipative Nonlinear Spatially Distributed Processes With Neuro-Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[25]  Huai-Ning Wu,et al.  Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control , 2017, IEEE Transactions on Cybernetics.

[26]  Derong Liu,et al.  Data-Based Adaptive Critic Designs for Nonlinear Robust Optimal Control With Uncertain Dynamics , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[27]  Shaocheng Tong,et al.  Adaptive Fuzzy Output Constrained Control Design for Multi-Input Multioutput Stochastic Nonstrict-Feedback Nonlinear Systems , 2017, IEEE Transactions on Cybernetics.

[28]  Huaguang Zhang,et al.  Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems With Time Delays Based on Heuristic Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[29]  Frank L. Lewis,et al.  Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[30]  Derong Liu,et al.  Neural-network-based adaptive optimal tracking control scheme for discrete-time nonlinear systems with approximation errors , 2013, Neurocomputing.

[31]  Derong Liu,et al.  Policy Iteration Algorithm for Online Design of Robust Control for a Class of Continuous-Time Nonlinear Systems , 2014, IEEE Transactions on Automation Science and Engineering.

[32]  I. Ha,et al.  Robust tracking in nonlinear systems , 1987 .

[33]  Bin Jiang,et al.  Online Adaptive Policy Learning Algorithm for $H_{\infty }$ State Feedback Control of Unknown Affine Nonlinear Discrete-Time Systems , 2014, IEEE Transactions on Cybernetics.

[34]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[35]  Derong Liu,et al.  Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach , 2012, Neurocomputing.

[36]  Shaocheng Tong,et al.  Fuzzy Adaptive Actuator Failure Compensation Control of Uncertain Stochastic Nonlinear Systems With Unmodeled Dynamics , 2014, IEEE Transactions on Fuzzy Systems.

[37]  Avimanyu Sahoo,et al.  Approximate Optimal Control of Affine Nonlinear Continuous-Time Systems Using Event-Sampled Neurodynamic Programming , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Dongbin Zhao,et al.  Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics , 2016 .

[40]  Huai-Ning Wu,et al.  Approximate Optimal Control Design for Nonlinear One-Dimensional Parabolic PDE Systems Using Empirical Eigenfunctions and Neural Network , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[41]  Tingwen Huang,et al.  Model-Free Optimal Tracking Control via Critic-Only Q-Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Derong Liu,et al.  Data-based robust optimal control of continuous-time affine nonlinear systems with matched uncertainties , 2016, Inf. Sci..

[43]  Zhong-Ping Jiang,et al.  Approximate Dynamic Programming for Optimal Stationary Control With Control-Dependent Noise , 2011, IEEE Transactions on Neural Networks.

[44]  Yongming Li,et al.  Observer-Based Adaptive Decentralized Fuzzy Fault-Tolerant Control of Nonlinear Large-Scale Systems With Actuator Failures , 2014, IEEE Transactions on Fuzzy Systems.

[45]  Frank L. Lewis,et al.  Optimized Assistive Human–Robot Interaction Using Reinforcement Learning , 2016, IEEE Transactions on Cybernetics.

[46]  Frank L. Lewis,et al.  Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..