Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence

In this paper, a novel neuro-optimal control scheme is proposed for unknown nonaffine nonlinear discrete-time systems by using adaptive dynamic programming (ADP) method. A neuro identifier is established by employing recurrent neural networks (RNNs) model to reconstruct the unknown system dynamics. The convergence of the identification error is proved by using the Lyapunov theory. Then based on the established RNN model, the ADP method is utilized to design the approximate optimal controller. Two neural networks (NNs) are used to implement the iterative algorithm. The convergence of the action NN error and weight estimation errors is demonstrated while considering the NN approximation errors. Finally, two numerical examples are used to demonstrate the effectiveness of the proposed control scheme.

[1]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Jagannathan Sarangapani,et al.  Neural Network Control of Nonlinear Discrete-Time Systems , 2018 .

[4]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[5]  Jennie Si,et al.  Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence) , 2004 .

[6]  Wen Yu,et al.  Nonlinear system identification using discrete-time recurrent neural networks with stable learning algorithms , 2004, Inf. Sci..

[7]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[8]  Huaguang Zhang,et al.  Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions , 2009, Neurocomputing.

[9]  Naira Hovakimyan,et al.  Neural Network Adaptive Control for a Class of Nonlinear Uncertain Dynamical Systems With Asymptotic Stability Guarantees , 2008, IEEE Transactions on Neural Networks.

[10]  D. Liu,et al.  Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems With $\varepsilon$-Error Bound , 2011, IEEE Transactions on Neural Networks.

[11]  Sarangapani Jagannathan,et al.  Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence , 2009, Neural Networks.

[12]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[13]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[14]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[15]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Wen Yu,et al.  Stability Analysis of Nonlinear System Identification via Delayed Neural Networks , 2007, IEEE Transactions on Circuits and Systems II: Express Briefs.

[17]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[18]  Derong Liu,et al.  Adaptive dynamic programming for optimal control of unknown nonlinear discrete-time systems , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[19]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[20]  Jay H. Lee,et al.  Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes , 2005, Autom..