Output Tracking Control Based on Adaptive Dynamic Programming With Multistep Policy Evaluation

In this paper, the optimal output tracking control problem of discrete-time nonlinear systems is considered. First, the augmented system is derived and the tracking control problem is converted to the regulation problem with a discounted performance index, which relies on the solution of the Bellman equation. It is known that policy iteration and value iteration are two classical algorithms for solving the Bellman equation. Through analysis of the two algorithms, it is found that policy iteration converges fast while requires an initial admissible control policy, and value iteration avoids the requirement of an initial admissible control policy but converges slowly. To achieve the tradeoff between policy iteration and value iteration, the multistep heuristic dynamic programming (MsHDP) is proposed by using multistep policy evaluation scheme. The convergence of MsHDP algorithm is proved by demonstrating that it converges to the solution of the Bellman equation. Subsequently, neural network-based actor-critic structure is developed to implement the MsHDP algorithm. The effectiveness and advantages of the developed MsHDP method are validated through comparative simulation studies.

[1]  Frank L. Lewis,et al.  Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Kevin M. Passino,et al.  Stable Adaptive Control and Estimation for Nonlinear Systems , 2001 .

[3]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[4]  Frank L. Lewis,et al.  Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data , 2015, IEEE Transactions on Cybernetics.

[5]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Chao Lu,et al.  Direct Heuristic Dynamic Programming for Damping Oscillations in a Large Power System , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[8]  Warren E. Dixon,et al.  Model-based reinforcement learning for on-line feedback-Nash equilibrium solution of N-player nonzero-sum differential games , 2014, 2014 American Control Conference.

[9]  Qinglai Wei,et al.  Discrete-Time Stable Generalized Self-Learning Optimal Control With Approximation Errors , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Haibo He,et al.  Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming , 2017, IEEE Transactions on Industrial Electronics.

[11]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[12]  Derong Liu,et al.  Error Bound Analysis of $Q$ -Function for Discounted Optimal Control Problems With Policy Iteration , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[13]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[14]  Frank L. Lewis,et al.  Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances , 2016, IEEE Transactions on Cybernetics.

[15]  Derong Liu,et al.  An Approximate Optimal Control Approach for Robust Stabilization of a Class of Discrete-Time Nonlinear Systems With Uncertainties , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[16]  Xiangnan Zhong,et al.  An Event-Triggered ADP Control Approach for Continuous-Time System With Unknown Internal States. , 2017 .

[17]  Frank L. Lewis,et al.  Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[18]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[19]  Hao Xu,et al.  Neural Network-Based Finite-Horizon Optimal Control of Uncertain Affine Nonlinear Discrete-Time Systems , 2015, IEEE Trans. Neural Networks Learn. Syst..

[20]  Huai-Ning Wu,et al.  Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control , 2017, IEEE Transactions on Cybernetics.

[21]  Huaguang Zhang,et al.  Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems With Time Delays Based on Heuristic Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[22]  Huaguang Zhang,et al.  Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adaptive dynamic programming , 2014, Int. J. Control.

[23]  Haibo He,et al.  Data-Driven Tracking Control With Adaptive Dynamic Programming for a Class of Continuous-Time Nonlinear Systems , 2017, IEEE Transactions on Cybernetics.

[24]  Haibo He,et al.  Near-Optimal Tracking Control of Mobile Robots Via Receding-Horizon Dual Heuristic Programming , 2016, IEEE Transactions on Cybernetics.

[25]  Sarangapani Jagannathan,et al.  Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[27]  Laxmidhar Behera,et al.  Near-Optimal Controller for Nonlinear Continuous-Time Systems With Unknown Dynamics Using Policy Iteration , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Derong Liu,et al.  A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems , 2015, Science China Information Sciences.

[30]  Ali Heydari,et al.  Fixed-final-time optimal tracking control of input-affine nonlinear systems , 2014, Neurocomputing.

[31]  Jinyu Wen,et al.  Adaptive Learning in Tracking Control Based on the Dual Critic Network Design , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[33]  Warren E. Dixon,et al.  Model-based reinforcement learning for infinite-horizon approximate optimal tracking , 2014, 53rd IEEE Conference on Decision and Control.

[34]  Yu Liu,et al.  Optimal constrained self-learning battery sequential management in microgrid via adaptive dynamic programming , 2017, IEEE/CAA Journal of Automatica Sinica.

[35]  Zhong-Ping Jiang,et al.  Adaptive dynamic programming and optimal control of nonlinear nonaffine systems , 2014, Autom..

[36]  Derong Liu,et al.  Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[37]  Derong Liu,et al.  Generalized Policy Iteration Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[38]  Derong Liu,et al.  Infinite Horizon Self-Learning Optimal Control of Nonaffine Discrete-Time Nonlinear Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[40]  Tingwen Huang,et al.  Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[41]  Jae Young Lee,et al.  Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems , 2012, Autom..

[42]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[43]  Ali Heydari,et al.  Finite-Horizon Control-Constrained Nonlinear Optimal Control Using Single Network Adaptive Critics , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Tingwen Huang,et al.  Data-Driven $H_\infty$ Control for Nonlinear Distributed Parameter Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[45]  S. Jagannathan,et al.  Near optimal output feedback control of nonlinear discrete-time systems based on reinforcement neural network learning , 2014, IEEE/CAA Journal of Automatica Sinica.

[46]  Avimanyu Sahoo,et al.  Near Optimal Event-Triggered Control of Nonlinear Discrete-Time Systems Using Neurodynamic Programming , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Frank L. Lewis,et al.  A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems , 2013, Autom..

[48]  Manoj Kumar,et al.  Reinforcement learning based controller synthesis for flexible aircraft wings , 2014, IEEE/CAA Journal of Automatica Sinica.

[49]  Frank L. Lewis,et al.  Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[50]  Changyin Sun,et al.  A new self-learning optimal control laws for a class of discrete-time nonlinear systems based on ESN architecture , 2014, Science China Information Sciences.

[51]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[52]  Han-Xiong Li,et al.  Adaptive Optimal Control of Highly Dissipative Nonlinear Spatially Distributed Processes With Neuro-Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[53]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[54]  Tingwen Huang,et al.  Model-Free Optimal Tracking Control via Critic-Only Q-Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[55]  Warren E. Dixon,et al.  Approximate optimal trajectory tracking for continuous-time nonlinear systems , 2013, Autom..

[56]  Jae Young Lee,et al.  On integral generalized policy iteration for continuous-time linear quadratic regulations , 2014, Autom..

[57]  Derong Liu,et al.  Adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[58]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.