Data-Driven Robust Control of Discrete-Time Uncertain Linear Systems via Off-Policy Reinforcement Learning

This paper presents a model-free solution to the robust stabilization problem of discrete-time linear dynamical systems with bounded and mismatched uncertainty. An optimal controller design method is derived to solve the robust control problem, which results in solving an algebraic Riccati equation (ARE). It is shown that the optimal controller obtained by solving the ARE can robustly stabilize the uncertain system. To develop a model-free solution to the translated ARE, off-policy reinforcement learning (RL) is employed to solve the problem in hand without the requirement of system dynamics. In addition, the comparisons between on- and off-policy RL methods are presented regarding the robustness to probing noise and the dependence on system dynamics. Finally, a simulation example is carried out to validate the efficacy of the presented off-policy RL approach.

[1]  W. Marsden I and J , 2012 .

[2]  Frank L. Lewis,et al.  H∞ control of linear discrete-time systems: Off-policy reinforcement learning , 2017, Autom..

[3]  Frank L. Lewis,et al.  Game Theory-Based Control System Algorithms with Real-Time Reinforcement Learning: How to Solve Multiplayer Games Online , 2017, IEEE Control Systems.

[4]  P. Khargonekar,et al.  STATESPACE SOLUTIONS TO STANDARD 2 H AND H? CONTROL PROBLEMS , 1989 .

[5]  Indra Narayan Kar,et al.  Stabilization of Uncertain Discrete-Time Linear System With Limited Communication , 2017, IEEE Transactions on Automatic Control.

[6]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[7]  Frank L. Lewis,et al.  Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Yixin Yin,et al.  Optimal Containment Control of Unknown Heterogeneous Systems With Active Leaders , 2019, IEEE Transactions on Control Systems Technology.

[9]  Stephen P. Boyd,et al.  Linear controller design: limits of performance , 1991 .

[10]  Frank L. Lewis,et al.  A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems , 2013, Autom..

[11]  Paul J. Werbos,et al.  The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting , 1994 .

[12]  Warren E. Dixon,et al.  Model-based reinforcement learning for infinite-horizon approximate optimal tracking , 2014, 53rd IEEE Conference on Decision and Control.

[13]  G. Zames Feedback and optimal sensitivity: Model reference transformations, multiplicative seminorms, and approximate inverses , 1981 .

[14]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[15]  L Poole David,et al.  Artificial Intelligence: Foundations of Computational Agents , 2010 .

[16]  Yu Yao,et al.  Robust Control: Theory and Applications , 2016 .

[17]  G. Stein,et al.  Multivariable feedback design: Concepts for a classical/modern synthesis , 1981 .

[18]  Frank L. Lewis,et al.  Output synchronization of heterogeneous discrete-time systems: A model-free optimal approach , 2017, Autom..

[19]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[20]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Feng-Yi Lin Robust Control Design: An Optimal Control Approach , 2007 .

[23]  Frank L. Lewis,et al.  Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning , 2016, Autom..

[24]  J. Doyle,et al.  Essentials of Robust Control , 1997 .

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Yixin Yin,et al.  Hamiltonian-Driven Adaptive Dynamic Programming for Continuous Nonlinear Dynamical Systems , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[28]  Dong Yue,et al.  Relaxed Real-Time Scheduling Stabilization of Discrete-Time Takagi–Sugeno Fuzzy Systems via An Alterable-Weights-Based Ranking Switching Mechanism , 2018, IEEE Transactions on Fuzzy Systems.

[29]  Dong Yue,et al.  Control Synthesis of Discrete-Time T–S Fuzzy Systems: Reducing the Conservatism Whilst Alleviating the Computational Burden , 2017, IEEE Transactions on Cybernetics.

[30]  Frank L. Lewis,et al.  Optimal and Autonomous Control Using Reinforcement Learning: A Survey , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Frank L. Lewis,et al.  Optimal Control: Lewis/Optimal Control 3e , 2012 .

[32]  Jinyu Wen,et al.  Adaptive Learning in Tracking Control Based on the Dual Critic Network Design , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[34]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[35]  Yixin Yin,et al.  Leader–Follower Output Synchronization of Linear Heterogeneous Systems With Active Leader Using Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Haibo He,et al.  A three-network architecture for on-line learning and optimization based on adaptive dynamic programming , 2012, Neurocomputing.

[37]  G. Dullerud,et al.  A Course in Robust Control Theory: A Convex Approach , 2005 .

[38]  L. Ljung,et al.  Adaptive Control Design and Analysis ( , 2014 .

[39]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .