Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator

Abstract In this paper, an effective off-policy algorithm is proposed to solve the continuous time nonzero-sum (NZS) control problem for unknown nonlinear systems with saturated actuator. A class of nonquadratic function is used to construct the performance functions to deal with constrained inputs. Utilizing the integral reinforcement learning (IRL) technique, the off-policy learning mechanism is introduced to design an iterative method for the continuous-time NZS constrained control problem without requiring the knowledge of system dynamics. To show the convergence of the proposed method, the traditional policy iteration (PI) method is discussed for the continuous-time NZS control problem with saturated actuator at first. Then, the equivalence of the proposed method with the traditional PI method is proved. Neural networks are introduced to construct the actor-critic structure, where the critic neural networks are aimed at approximating the iterative value functions and the actor neural networks are aimed at approximating the iterative control policies. Finally, two cases are simulated to verify the effectiveness of the proposed method.

[1]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[2]  Yanhong Luo,et al.  Data-driven optimal tracking control for a class of affine non-linear continuous-time systems with completely unknown dynamics , 2016 .

[3]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[4]  Huaguang Zhang,et al.  Leader-Based Optimal Coordination Control for the Consensus Problem of Multiagent Differential Games via Fuzzy Adaptive Dynamic Programming , 2015, IEEE Transactions on Fuzzy Systems.

[5]  Frank L. Lewis,et al.  Integral Reinforcement Learning for online computation of feedback Nash strategies of nonzero-sum differential games , 2010, CDC 2010.

[6]  Tingwen Huang,et al.  Reinforcement learning solution for HJB equation arising in constrained optimal control problem , 2015, Neural Networks.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Kay Chen Tan,et al.  Global exponential stability of discrete-time neural networks for constrained quadratic optimization , 2004, Neurocomputing.

[9]  Huaguang Zhang,et al.  Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP , 2013, IEEE Transactions on Cybernetics.

[10]  Frank L. Lewis,et al.  Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[11]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[12]  Qichao Zhang,et al.  Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs , 2017, Neurocomputing.

[13]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[14]  Frank L. Lewis,et al.  Adaptive dynamic programming for online solution of a zero-sum differential game , 2011 .

[15]  Xiaohong Cui,et al.  Data-based approximate optimal control for nonzero-sum games of multi-player systems using adaptive dynamic programming , 2018, Neurocomputing.

[16]  Huaguang Zhang,et al.  General value iteration based single network approach for constrained optimal controller design of partially-unknown continuous-time nonlinear systems , 2018, J. Frankl. Inst..

[17]  Tingwen Huang,et al.  Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[18]  Milan Tuba,et al.  Improved seeker optimization algorithm hybridized with firefly algorithm for constrained optimization problems , 2014, Neurocomputing.

[19]  Derong Liu,et al.  Neural-Network-Based Online HJB Solution for Optimal Robust Guaranteed Cost Control of Continuous-Time Uncertain Nonlinear Systems , 2014, IEEE Transactions on Cybernetics.

[20]  Huaguang Zhang,et al.  Distributed Cooperative Optimal Control for Multiagent Systems on Directed Graphs: An Inverse Optimal Approach , 2015, IEEE Transactions on Cybernetics.

[21]  Huaguang Zhang,et al.  Online optimal control of unknown discrete-time nonlinear systems by using time-based adaptive dynamic programming , 2015, Neurocomputing.

[22]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[24]  Frank L. Lewis,et al.  Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances , 2016, IEEE Transactions on Cybernetics.

[25]  Jing Na,et al.  Online optimal solutions for multi-player nonzero-sum game with completely unknown dynamics , 2017, Neurocomputing.

[26]  Frank L. Lewis,et al.  Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games , 2017, IEEE Transactions on Neural Networks and Learning Systems.