Value Iteration-Based H∞ Controller Design for Continuous-Time Nonlinear Systems Subject to Input Constraints

In this paper, a novel integral reinforcement learning method is proposed based on value iteration (VI) to design the <inline-formula> <tex-math notation="LaTeX">$H_{\infty }$ </tex-math></inline-formula> controller for continuous-time nonlinear systems subject to input constraints. To confront the control constraints, a nonquadratic function is introduced to reconstruct the <inline-formula> <tex-math notation="LaTeX">${L_{2}}$ </tex-math></inline-formula>-gain condition for the <inline-formula> <tex-math notation="LaTeX">$H_{\infty }$ </tex-math></inline-formula> control problem. Then, the VI method is proposed to solve the corresponding Hamilton–Jacobi–Isaacs equation initialized with an arbitrary positive semi-definite value function. Compared with most existing works developed based on policy iteration, the initial admissible control policy is no longer required which results in a more free initial condition. The iterative process of the proposed VI method is analyzed and the convergence to the saddle point solution is proved in a general way. For the implementation of the proposed method, only one neural network is introduced to approximate the iterative value function, which results in a simpler architecture with less computational load compared with utilizing three neural networks. To verify the effectiveness of the VI-based method, two nonlinear cases are presented, respectively.

[1]  Dimitri P. Bertsekas,et al.  Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4]  Qinglai Wei,et al.  Data-Driven Zero-Sum Neuro-Optimal Control for a Class of Continuous-Time Unknown Nonlinear Systems With Disturbance Using ADP , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Bo Lincoln,et al.  Relaxing dynamic programming , 2006, IEEE Transactions on Automatic Control.

[6]  Frank L. Lewis,et al.  H∞ control of linear discrete-time systems: Off-policy reinforcement learning , 2017, Autom..

[7]  Derong Liu,et al.  Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics , 2014, IEEE Transactions on Automation Science and Engineering.

[8]  Chaomin Luo,et al.  Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms , 2017, IEEE Transactions on Cybernetics.

[9]  Derong Liu,et al.  Infinite Horizon Self-Learning Optimal Control of Nonaffine Discrete-Time Nonlinear Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[10]  A. Rantzer Relaxed dynamic programming in switching systems , 2006 .

[11]  Frank L. Lewis,et al.  Neural network solution for finite-horizon H-infinity constrained optimal control of nonlinear systems , 2007 .

[12]  Derong Liu,et al.  Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems , 2016, IEEE Transactions on Cybernetics.

[13]  W. Ames The Method of Weighted Residuals and Variational Principles. By B. A. Finlayson. Academic Press, 1972. 412 pp. $22.50. , 1973, Journal of Fluid Mechanics.

[14]  Huaguang Zhang,et al.  Online optimal control of unknown discrete-time nonlinear systems by using time-based adaptive dynamic programming , 2015, Neurocomputing.

[15]  Huaguang Zhang,et al.  Fault-Tolerant Controller Design for a Class of Nonlinear MIMO Discrete-Time Systems via Online Reinforcement Learning Algorithm , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[16]  Dongbin Zhao,et al.  Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Haibo He,et al.  Adaptive Critic Nonlinear Robust Control: A Survey , 2017, IEEE Transactions on Cybernetics.

[18]  Derong Liu,et al.  Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints , 2015, IEEE Transactions on Cybernetics.

[19]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[20]  Huai‐Ning Wu,et al.  Computationally efficient simultaneous policy update algorithm for nonlinear H∞ state feedback control with Galerkin's method , 2013 .

[21]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[22]  S. Lyshevski Optimal control of nonlinear continuous-time systems: design of bounded controllers via generalized nonquadratic functionals , 1998, Proceedings of the 1998 American Control Conference. ACC (IEEE Cat. No.98CH36207).

[23]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[24]  Andrew R. Teel,et al.  Control of linear systems with saturating actuators , 1995, Proceedings of 1995 American Control Conference - ACC'95.

[25]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[26]  Avimanyu Sahoo,et al.  Approximate Optimal Control of Affine Nonlinear Continuous-Time Systems Using Event-Sampled Neurodynamic Programming , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Yanhong Luo,et al.  Data-driven optimal tracking control for a class of affine non-linear continuous-time systems with completely unknown dynamics , 2016 .

[28]  Huaguang Zhang,et al.  Fault-Tolerant Control of a Nonlinear System Based on Generalized Fuzzy Hyperbolic Model and Adaptive Disturbance Observer , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[29]  F. Lewis,et al.  Online solution of nonquadratic two‐player zero‐sum games arising in the H ∞  control of constrained input systems , 2014 .

[30]  Bin Jiang,et al.  Online Adaptive Policy Learning Algorithm for $H_{\infty }$ State Feedback Control of Unknown Affine Nonlinear Discrete-Time Systems , 2014, IEEE Transactions on Cybernetics.

[31]  Lei Guo,et al.  Optimal control for networked control systems with disturbances: a delta operator approach , 2017 .

[32]  T. Başar Feedback and Optimal Sensitivity: Model Reference Transformations, Multiplicative Seminorms, and Approximate Inverses , 2001 .

[33]  Eduardo Sontag,et al.  A general result on the stabilization of linear systems using bounded controls , 1994, IEEE Trans. Autom. Control..

[34]  Frank L. Lewis,et al.  Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis , 2017, IEEE Transactions on Cybernetics.

[35]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[36]  Frank L. Lewis,et al.  Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[37]  Lei Guo,et al.  Composite control of linear quadratic games in delta domain with disturbance observers , 2017, J. Frankl. Inst..

[38]  Derong Liu,et al.  Data-Based Adaptive Critic Designs for Nonlinear Robust Optimal Control With Uncertain Dynamics , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[39]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[40]  Derong Liu,et al.  Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm , 2013, Neurocomputing.

[41]  Huai-Ning Wu,et al.  Neural Network Based Online Simultaneous Policy Update Algorithm for Solving the HJI Equation in Nonlinear $H_{\infty}$ Control , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Lei Guo,et al.  Event-Triggered Strategy Design for Discrete-Time Nonlinear Quadratic Games With Disturbance Compensations: The Noncooperative Case , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[43]  Tamer Başar,et al.  H1-Optimal Control and Related Minimax Design Problems , 1995 .

[44]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[45]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[46]  Huaguang Zhang,et al.  LQR-Based Optimal Distributed Cooperative Design for Linear Discrete-Time Multiagent Systems , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[48]  Tianyou Chai,et al.  Online Solution of Two-Player Zero-Sum Games for Continuous-Time Nonlinear Systems With Completely Unknown Dynamics , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[49]  Tingwen Huang,et al.  Reinforcement learning solution for HJB equation arising in constrained optimal control problem , 2015, Neural Networks.

[50]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[51]  Frank L. Lewis,et al.  Policy Iterations on the Hamilton–Jacobi–Isaacs Equation for $H_{\infty}$ State Feedback Control With Input Saturation , 2006, IEEE Transactions on Automatic Control.

[52]  Frank L. Lewis,et al.  Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems , 2008, IEEE Transactions on Neural Networks.

[53]  Huaguang Zhang,et al.  Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown Dynamics Using Reinforcement Learning Method , 2017, IEEE Transactions on Industrial Electronics.

[54]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[55]  A. Isidori,et al.  H∞ control via measurement feedback for general nonlinear systems , 1995, IEEE Trans. Autom. Control..

[56]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[57]  Van,et al.  L2-Gain Analysis of Nonlinear Systems and Nonlinear State Feedback H∞ Control , 2004 .

[58]  Haibo He,et al.  Intelligent Critic Control With Disturbance Attenuation for Affine Dynamics Including an Application to a Microgrid System , 2017, IEEE Transactions on Industrial Electronics.