Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics

An online adaptive optimal control is proposed for continuous-time nonlinear systems with completely unknown dynamics, which is achieved by developing a novel identifier-critic-based approximate dynamic programming algorithm with a dual neural network (NN) approximation structure. First, an adaptive NN identifier is designed to obviate the requirement of complete knowledge of system dynamics, and a critic NN is employed to approximate the optimal value function. Then, the optimal control law is computed based on the information from the identifier NN and the critic NN, so that the actor NN is not needed. In particular, a novel adaptive law design method with the parameter estimation error is proposed to online update the weights of both identifier NN and critic NN simultaneously, which converge to small neighbourhoods around their ideal values. The closed-loop system stability and the convergence to small vicinity around the optimal solution are all proved by means of the Lyapunov theory. The proposed adaptation algorithm is also improved to achieve finite-time convergence of the NN weights. Finally, simulation results are provided to exemplify the efficacy of the proposed methods.

[1]  Zhongke Shi,et al.  Reinforcement Learning Output Feedback NN Control Using Deterministic Learning Technique , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[3]  Qinmin Yang,et al.  Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems using Online Approximators , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Sarangapani Jagannathan,et al.  Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Mohammed M'Saad,et al.  High-gain observer for a class of time-delay nonlinear systems , 2010, Int. J. Control.

[7]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[8]  Guido Herrmann,et al.  Robust adaptive finite-time parameter estimation and control of nonlinear systems , 2011, 2011 IEEE International Symposium on Intelligent Control.

[9]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[11]  Vadim I. Utkin,et al.  Sliding Modes in Control and Optimization , 1992, Communications and Control Engineering Series.

[12]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[13]  Lyle Noakes,et al.  Continuous-Time Adaptive Critics , 2007, IEEE Transactions on Neural Networks.

[14]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[15]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[16]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[17]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[18]  Frank L. Lewis,et al.  Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Huaguang Zhang,et al.  Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems With Time Delays Based on Heuristic Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[20]  Rushikesh Kamalapurkar,et al.  Concurrent learning-based approximate optimal regulation , 2013, 52nd IEEE Conference on Decision and Control.

[21]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[22]  Derong Liu,et al.  Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic programming , 2013, Int. J. Control.

[23]  Warren E. Dixon,et al.  Model-based reinforcement learning for approximate optimal regulation , 2016, Autom..

[24]  Shaocheng Tong,et al.  Reinforcement Learning Design-Based Adaptive Tracking Control With Less Learning Parameters for Nonlinear Discrete-Time MIMO Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[26]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[27]  Huaguang Zhang,et al.  Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adaptive dynamic programming , 2014, Int. J. Control.

[28]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[29]  J. Na,et al.  Online adaptive approximate optimal tracking control with simplified dual approximation structure for continuous-time unknown nonlinear systems , 2014, IEEE/CAA Journal of Automatica Sinica.

[30]  Frank L. Lewis,et al.  A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems , 2013, Autom..

[31]  Xiong Yang,et al.  Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints , 2014, Int. J. Control.

[32]  J. Primbs,et al.  Constrained nonlinear optimal control: a converse HJB approach , 1996 .

[33]  Jing Na,et al.  Adaptive Control for Nonlinear Pure-Feedback Systems With High-Order Sliding Mode Observer , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Frank L. Lewis,et al.  Optimal Control: Lewis/Optimal Control 3e , 2012 .

[35]  Jinyu Wen,et al.  Adaptive Learning in Tracking Control Based on the Dual Critic Network Design , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[37]  S. Bhat,et al.  Continuous finite-time stabilization of the translational and rotational double integrators , 1998, IEEE Trans. Autom. Control..

[38]  Frank L. Lewis,et al.  Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[39]  Frank L. Lewis,et al.  Neural network compensation control for mechanical systems with disturbances , 2009, Autom..

[40]  S. Sastry,et al.  Adaptive Control: Stability, Convergence and Robustness , 1989 .

[41]  Kunsoo Huh,et al.  Observer design methodology for stochastic and deterministic robustness , 2008, Int. J. Control.

[42]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[43]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[44]  Victor M. Becerra,et al.  Optimal control , 2008, Scholarpedia.