Robust Control of Unknown Observable Nonlinear Systems Solved as a Zero-Sum Game

An optimal robust control solution for general nonlinear systems with unknown but observable dynamics is advanced here. The underlying Hamilton-Jacobi-Isaacs (HJI) equation of the corresponding zero-sum two-player game (ZS-TP-G) is learned using a Q-learning-based approach employing only input-output system measurements, assuming system observability. An equivalent virtual state-space model is built from the system’s input-output samples and it is shown that controlling the former implies controlling the latter. Since the existence of a saddle-point solution to the ZS-TP-G is assumed unverifiable, the solution is derived in terms of upper-optimal and lower-optimal controllers. The learning convergence is theoretically ensured while practical implementation is performed using neural networks that provide scalability to the control problem dimension and automatic feature selection. The learning strategy is checked on an active suspension system, a good candidate for the robust control problem with respect to road profile disturbance rejection.

[1]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[2]  Bing Bai,et al.  Event-Driven-Modular Adaptive Backstepping Optimal Control for Strict-Feedback Systems Through Zero-Sum Differential Games , 2020, IEEE Access.

[3]  Ali Heydari,et al.  Suboptimal Scheduling in Switched Systems With Continuous-Time Dynamics: A Least Squares Approach , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Jinna Li,et al.  H∞ Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning , 2020, IEEE Access.

[5]  Shuping He,et al.  Finite-time stabilization for positive Markovian jumping neural networks , 2020, Appl. Math. Comput..

[6]  Karl Tuyls,et al.  Integrating State Representation Learning Into Deep Reinforcement Learning , 2018, IEEE Robotics and Automation Letters.

[7]  Hui Zhao,et al.  Unknown input observer design for fuzzy systems with uncertainties , 2015, Appl. Math. Comput..

[8]  Van,et al.  L2-Gain Analysis of Nonlinear Systems and Nonlinear State Feedback H∞ Control , 2004 .

[9]  Derong Liu,et al.  Data-Based Controllability and Observability Analysis of Linear Discrete-Time Systems , 2011, IEEE Transactions on Neural Networks.

[10]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[11]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Haibo He,et al.  Data-Driven Tracking Control With Adaptive Dynamic Programming for a Class of Continuous-Time Nonlinear Systems , 2017, IEEE Transactions on Cybernetics.

[13]  Sergio M. Savaresi,et al.  Batch Reinforcement Learning for semi-active suspension control , 2009, 2009 IEEE Control Applications, (CCA) & Intelligent Control, (ISIC).

[14]  Huaguang Zhang,et al.  Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP , 2013, IEEE Transactions on Cybernetics.

[15]  Lucian Busoniu,et al.  Reinforcement learning for control: Performance, stability, and deep approximators , 2018, Annu. Rev. Control..

[16]  Ihsan Ömür Bucak,et al.  Vibration control of a nonlinear quarter-car active suspension system by reinforcement learning , 2012, Int. J. Syst. Sci..

[17]  Frank L. Lewis,et al.  Model-free H∞ control design for unknown linear discrete-time systems via Q-learning with LMI , 2010, Autom..

[18]  Haibo He,et al.  GrDHP: A General Utility Function Representation for Dual Heuristic Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Frank L. Lewis,et al.  Policy Iterations on the Hamilton–Jacobi–Isaacs Equation for $H_{\infty}$ State Feedback Control With Input Saturation , 2006, IEEE Transactions on Automatic Control.

[20]  Jing Na,et al.  Approximation-Free Control for Vehicle Active Suspensions With Hydraulic Actuator , 2018, IEEE Transactions on Industrial Electronics.

[21]  Jun Fu,et al.  Robust Adaptive Dynamic Programming of Two-Player Zero-Sum Games for Continuous-Time Linear Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Derong Liu,et al.  Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Tzyh Jong Tarn,et al.  Observability of Nonlinear Systems , 1973, Inf. Control..

[24]  Qinglai Wei,et al.  Data-based analysis of discrete-time linear systems in noisy environment: Controllability and observability , 2014, Inf. Sci..

[25]  Mariesa L. Crow,et al.  Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[26]  Derong Liu,et al.  Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[27]  Yin Yang,et al.  Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems , 2020, IEEE Transactions on Cybernetics.

[28]  Radu-Emil Precup,et al.  Data-Driven Model-Free Model-Reference Nonlinear Virtual State Feedback Control from Input-Output Data , 2018, 2018 26th Mediterranean Conference on Control and Automation (MED).

[29]  Hamid Reza Karimi,et al.  Finite-Time L2-Gain Asynchronous Control for Continuous-Time Positive Hidden Markov Jump Systems via T–S Fuzzy Model Approach , 2020, IEEE Transactions on Cybernetics.

[30]  Yang Liu,et al.  H∞ Tracking Control of Discrete-Time System With Delays via Data-Based Adaptive Dynamic Programming , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[31]  Hua Xu,et al.  Linear-quadratic zero-sum differential games for generalized state space systems , 1994, IEEE Trans. Autom. Control..

[32]  Bin Jiang,et al.  Online Adaptive Policy Learning Algorithm for $H_{\infty }$ State Feedback Control of Unknown Affine Nonlinear Discrete-Time Systems , 2014, IEEE Transactions on Cybernetics.

[33]  Timothy Gordon,et al.  Continuous action reinforcement learning applied to vehicle suspension control , 1997 .

[34]  Frank L. Lewis,et al.  Adaptive Critic Designs for Discrete-Time Zero-Sum Games With Application to $H_{\infty}$ Control , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[35]  Jinwu Gao,et al.  Linear–Quadratic Uncertain Differential Game With Application to Resource Extraction Problem , 2016, IEEE Transactions on Fuzzy Systems.

[36]  Xiao-Heng Chang,et al.  Robust H∞ control for networked control systems with randomly occurring uncertainties: Observer-based case. , 2018, ISA transactions.

[37]  Jacob Engwerda,et al.  Uniqueness conditions for the affine open-loop linear quadratic differential game , 2008, Autom..

[38]  Shuping He,et al.  Finite-time positiveness and distributed control of Lipschitz nonlinear multi-agent systems , 2019, J. Frankl. Inst..

[39]  Kenneth A. Loparo,et al.  Neural Network-Based Closed-Loop Deep Brain Stimulation for Modulation of Pathological Oscillation in Parkinson’s Disease , 2020, IEEE Access.

[40]  Maarten Steinbuch,et al.  Implementation of Dynamic Programming for Optimal Control Problems With Continuous States , 2015, IEEE Transactions on Control Systems Technology.

[41]  Derong Liu,et al.  Event-Triggered Optimal Control With Performance Guarantees Using Adaptive Dynamic Programming , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Frank L. Lewis,et al.  Multi-agent differential graphical games , 2011, Proceedings of the 30th Chinese Control Conference.

[43]  Derong Liu,et al.  Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm , 2013, Neurocomputing.

[44]  Xiaolong Wang,et al.  Semi-active adaptive optimal control of vehicle suspension with a magnetorheological damper based on policy iteration , 2017 .

[45]  Warren E. Dixon,et al.  Approximate Dynamic Programming: Combining Regional and Local State Following Approximations , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Wei Wang,et al.  Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method , 2019, Int. J. Syst. Sci..

[47]  Chaomin Luo,et al.  Reinforcement Learning-Based Optimal Stabilization for Unknown Nonlinear Systems Subject to Inputs With Uncertain Constraints , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[48]  Frank L. Lewis,et al.  Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[49]  Qichao Zhang,et al.  Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics , 2016, IEEE Transactions on Cybernetics.

[50]  Yanhong Luo,et al.  Data-based adaptive critic design for discrete-time zero-sum games using output feedback , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[51]  Bo Dong,et al.  Actor-Critic-Identifier Structure-Based Decentralized Neuro-Optimal Control of Modular Robot Manipulators With Environmental Collisions , 2019, IEEE Access.

[52]  Shengwei Mei,et al.  Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[53]  Mohammad Fard,et al.  Designing active vehicle suspension system using critic-based control strategy , 2015 .

[54]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .