Reinforcement Learning-Based Optimal Stabilization for Unknown Nonlinear Systems Subject to Inputs With Uncertain Constraints

This article presents a novel reinforcement learning strategy that addresses an optimal stabilizing problem for unknown nonlinear systems subject to uncertain input constraints. The control algorithm is composed of two parts, i.e., online learning optimal control for the nominal system and feedforward neural networks (NNs) compensation for handling uncertain input constraints, which are considered as the saturation nonlinearities. Integrating the input–output data and recurrent NN, a Luenberger observer is established to approximate the unknown system dynamics. For nominal systems without input constraints, the online learning optimal control policy is derived by solving Hamilton–Jacobi–Bellman equation via a critic NN alone. By transforming the uncertain input constraints to saturation nonlinearities, the uncertain input constraints can be compensated by employing a feedforward NN compensator. The convergence of the closed-loop system is guaranteed to be uniformly ultimately bounded by using the Lyapunov stability analysis. Finally, the effectiveness of the developed stabilization scheme is illustrated by simulation studies.

[1]  Huaguang Zhang,et al.  Optimal control laws for time-delay systems with saturating actuators based on heuristic dynamic programming , 2010, Neurocomputing.

[2]  Haibo He,et al.  Adaptive near-optimal controllers for non-linear decentralised feedback stabilisation problems , 2017 .

[3]  Sarangapani Jagannathan,et al.  Decentralized Optimal Control of a Class of Interconnected Nonlinear Discrete-Time Systems by Using Online Hamilton-Jacobi-Bellman Formulation , 2011, IEEE Transactions on Neural Networks.

[4]  Ali Heydari,et al.  Finite-Horizon Control-Constrained Nonlinear Optimal Control Using Single Network Adaptive Critics , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[6]  Derong Liu,et al.  Observer based adaptive dynamic programming for fault tolerant control of a class of nonlinear systems , 2017, Inf. Sci..

[7]  Derong Liu,et al.  Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics , 2014, IEEE Transactions on Automation Science and Engineering.

[8]  Jun Fu,et al.  Robust Adaptive Dynamic Programming of Two-Player Zero-Sum Games for Continuous-Time Linear Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Xiong Yang,et al.  Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints , 2014, Int. J. Control.

[10]  Huaguang Zhang,et al.  Iterative ADP learning algorithms for discrete-time multi-player games , 2018, Artificial Intelligence Review.

[11]  Zhong-Ping Jiang,et al.  Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming , 2016, Autom..

[12]  Derong Liu,et al.  Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints , 2015, IEEE Transactions on Cybernetics.

[13]  Derong Liu,et al.  Observer-critic structure-based adaptive dynamic programming for decentralised tracking control of unknown large-scale nonlinear systems , 2017, Int. J. Syst. Sci..

[14]  Derong Liu,et al.  A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems , 2015, Science China Information Sciences.

[15]  Derong Liu,et al.  Residential energy scheduling for variable weather solar energy based on adaptive dynamic programming , 2018, IEEE/CAA Journal of Automatica Sinica.

[16]  Derong Liu,et al.  Event-Triggered Decentralized Tracking Control of Modular Reconfigurable Robots Through Adaptive Dynamic Programming , 2020, IEEE Transactions on Industrial Electronics.

[17]  Derong Liu,et al.  An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs , 2013, Inf. Sci..

[18]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Guanghong Yang,et al.  Approximate guaranteed cost fault-tolerant control of unknown nonlinear systems with time-varying actuator faults , 2016 .

[20]  Derong Liu,et al.  Decentralized Stabilization for a Class of Continuous-Time Nonlinear Interconnected Systems Using Online Learning Optimal Control Approach , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Haibo He,et al.  Event-Triggered Adaptive Dynamic Programming for Continuous-Time Systems With Control Constraints , 2017, IEEE Trans. Neural Networks Learn. Syst..

[22]  Xiong Yang,et al.  Online approximate solution of HJI equation for unknown constrained-input nonlinear continuous-time systems , 2016, Inf. Sci..

[23]  Frank L. Lewis,et al.  Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems , 2008, IEEE Transactions on Neural Networks.

[24]  Huaguang Zhang,et al.  Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems With Time Delays Based on Heuristic Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[25]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[26]  Frank L. Lewis,et al.  Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Guang-Hong Yang,et al.  Adaptive Actor–Critic Design-Based Integral Sliding-Mode Control for Partially Unknown Nonlinear Systems With Input Disturbances , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[29]  Derong Liu,et al.  Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning , 2016, Inf. Sci..

[30]  Berno J. E. Misgeld,et al.  Optimal learning control of oxygen saturation using a policy iteration algorithm and a proof-of-concept in an interconnecting three-tank system , 2017 .

[31]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[32]  Rastko R. Selmic,et al.  Neural network control of a class of nonlinear systems with actuator saturation , 2006, IEEE Transactions on Neural Networks.

[33]  Xiaohong Cui,et al.  H∞ control with constrained input for completely unknown nonlinear systems using data-driven reinforcement learning method , 2017, Neurocomputing.

[34]  Jennie Si,et al.  Online learning control by association and reinforcement. , 2001, IEEE transactions on neural networks.

[35]  Hao Xu,et al.  Finite-Horizon Near-Optimal Output Feedback Neural Network Control of Quantized Nonlinear Discrete-Time Systems With Input Constraint , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Jing Na,et al.  Observer-based adaptive optimal control for unknown singularly perturbed nonlinear systems with input constraints , 2017, IEEE/CAA Journal of Automatica Sinica.

[37]  Dongbin Zhao,et al.  Comprehensive comparison of online ADP algorithms for continuous-time optimal control , 2017, Artificial Intelligence Review.

[38]  Yang Xiong,et al.  Adaptive Dynamic Programming with Applications in Optimal Control , 2017 .

[39]  Tingwen Huang,et al.  Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[40]  Haibo He,et al.  Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming , 2017, IEEE Transactions on Industrial Electronics.

[41]  Huaguang Zhang,et al.  Neural-Network-Based Constrained Optimal Control Scheme for Discrete-Time Switched Nonlinear System Using Dual Heuristic Programming , 2014, IEEE Transactions on Automation Science and Engineering.

[42]  Tingwen Huang,et al.  Reinforcement learning solution for HJB equation arising in constrained optimal control problem , 2015, Neural Networks.

[43]  Haibo He,et al.  Intelligent Critic Control With Disturbance Attenuation for Affine Dynamics Including an Application to a Microgrid System , 2017, IEEE Transactions on Industrial Electronics.

[44]  Yu Liu,et al.  Optimal constrained self-learning battery sequential management in microgrid via adaptive dynamic programming , 2017, IEEE/CAA Journal of Automatica Sinica.

[45]  Derong Liu,et al.  Online fault compensation control based on policy iteration algorithm for a class of affine non-linear systems with actuator failures , 2016 .

[46]  Shu-Mei Guo,et al.  A new optimal linear quadratic observer‐based tracker under input constraint for the unknown system with a direct feed‐through term , 2016 .

[47]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.