Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances

An optimal control method is developed for unknown continuous-time systems with unknown disturbances in this paper. The integral reinforcement learning (IRL) algorithm is presented to obtain the iterative control. Off-policy learning is used to allow the dynamics to be completely unknown. Neural networks are used to construct critic and action networks. It is shown that if there are unknown disturbances, off-policy IRL may not converge or may be biased. For reducing the influence of unknown disturbances, a disturbances compensation controller is added. It is proven that the weight errors are uniformly ultimately bounded based on Lyapunov techniques. Convergence of the Hamiltonian function is also proven. The simulation study demonstrates the effectiveness of the proposed optimal control method for unknown systems with disturbances.

[1]  Zhong-Ping Jiang,et al.  Robust adaptive dynamic programming for nonlinear control design , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[2]  Huaguang Zhang,et al.  Adaptive Dynamic Programming for a Class of Complex-Valued Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Frank L. Lewis,et al.  Multiple Actor-Critic Structures for Continuous-Time Optimal Control Using Input-Output Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Bin Jiang,et al.  Online Adaptive Policy Learning Algorithm for $H_{\infty }$ State Feedback Control of Unknown Affine Nonlinear Discrete-Time Systems , 2014, IEEE Transactions on Cybernetics.

[5]  Huaguang Zhang,et al.  Approximation-Based Adaptive Neural Control Design for a Class of Nonlinear Systems , 2014, IEEE Transactions on Cybernetics.

[6]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Ali Heydari,et al.  Finite-Horizon Control-Constrained Nonlinear Optimal Control Using Single Network Adaptive Critics , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  F. Lewis,et al.  Online adaptive algorithm for optimal control with integral reinforcement learning , 2014 .

[10]  Huaguang Zhang,et al.  Neural-Network-Based Constrained Optimal Control Scheme for Discrete-Time Switched Nonlinear System Using Dual Heuristic Programming , 2014, IEEE Transactions on Automation Science and Engineering.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Frank L. Lewis,et al.  Adaptive dynamic programming for online solution of a zero-sum differential game , 2011 .

[13]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[14]  Haibo He,et al.  Online Learning Control Using Adaptive Critic Designs With Sparse Kernel Machines , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[16]  P. Kokotovic,et al.  Nonlinear control via approximate input-output linearization: the ball and beam example , 1992 .

[17]  F. Lewis,et al.  Neural Network Control of Robot Arms and Nonlinear Systems , 1997 .

[18]  Jinyu Wen,et al.  Adaptive Learning in Tracking Control Based on the Dual Critic Network Design , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[20]  Derong Liu,et al.  Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Cybernetics.

[21]  Frank L. Lewis,et al.  Integral Reinforcement Learning for online computation of feedback Nash strategies of nonzero-sum differential games , 2010, 49th IEEE Conference on Decision and Control (CDC).

[22]  Frank L. Lewis,et al.  A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems , 2013, Autom..

[23]  Derong Liu,et al.  Adaptive Dynamic Programming for Optimal Tracking Control of Unknown Nonlinear Systems With Application to Coal Gasification , 2014, IEEE Transactions on Automation Science and Engineering.

[24]  Huaguang Zhang,et al.  Multi-objective optimal control for a class of unknown nonlinear systems based on finite-approximation-error ADP algorithm , 2013, Neurocomputing.

[25]  Derong Liu,et al.  Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics , 2014, IEEE Transactions on Automation Science and Engineering.

[26]  Derong Liu,et al.  Infinite Horizon Self-Learning Optimal Control of Nonaffine Discrete-Time Nonlinear Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[28]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[30]  Frank L. Lewis,et al.  Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[31]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[32]  Daxue Liu,et al.  Self-Learning Cruise Control Using Kernel-Based Least Squares Policy Iteration , 2014, IEEE Transactions on Control Systems Technology.

[33]  Derong Liu,et al.  Numerical adaptive learning control scheme for discrete-time non-linear systems , 2013 .

[34]  Derong Liu,et al.  An iterative ϵ-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state , 2012, Neural Networks.

[35]  Yu Jiang,et al.  Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[37]  Jae Young Lee,et al.  Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems , 2012, Autom..

[38]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[39]  George N. Saridis,et al.  An Approximation Theory of Optimal Control for Trainable Manipulators , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[40]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[41]  Huaguang Zhang,et al.  Optimal control laws for time-delay systems with saturating actuators based on heuristic dynamic programming , 2010, Neurocomputing.

[42]  Kurt Hornik,et al.  Degree of Approximation Results for Feedforward Networks Approximating Unknown Mappings and Their Derivatives , 1994, Neural Computation.

[43]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[44]  Zhong-Ping Jiang,et al.  Robust Adaptive Dynamic Programming , 2017 .