H∞ control of linear discrete-time systems: Off-policy reinforcement learning

Abstract In this paper, a model-free solution to the H ∞ control of linear discrete-time systems is presented. The proposed approach employs off-policy reinforcement learning (RL) to solve the game algebraic Riccati equation online using measured data along the system trajectories. Like existing model-free RL algorithms, no knowledge of the system dynamics is required. However, the proposed method has two main advantages. First, the disturbance input does not need to be adjusted in a specific manner. This makes it more practical as the disturbance cannot be specified in most real-world applications. Second, there is no bias as a result of adding a probing noise to the control input to maintain persistence of excitation (PE) condition. Consequently, the convergence of the proposed algorithm is not affected by probing noise. An example of the H ∞ control for an F-16 aircraft is given. It is seen that the convergence of the new off-policy RL algorithm is insensitive to probing noise.

[1]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Huai-Ning Wu,et al.  Neural Network Based Online Simultaneous Policy Update Algorithm for Solving the HJI Equation in Nonlinear $H_{\infty}$ Control , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Frank L. Lewis,et al.  Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[4]  Tamer Başar,et al.  H1-Optimal Control and Related Minimax Design Problems , 1995 .

[5]  Derong Liu,et al.  Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics , 2014, IEEE Transactions on Automation Science and Engineering.

[6]  Tingwen Huang,et al.  Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[7]  Tingwen Huang,et al.  Data-Driven $H_\infty$ Control for Nonlinear Distributed Parameter Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[8]  A. Weeren,et al.  The discrete-time Riccati equation related to the H∞ control problem , 1994, IEEE Trans. Autom. Control..

[9]  Frank L. Lewis,et al.  Optimal Control , 1986 .

[10]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[11]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[12]  Ben M. Chen Robust and H[∞] control , 2000 .

[13]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[14]  P. Khargonekar,et al.  State-space solutions to standard H2 and H∞ control problems , 1988, 1988 American Control Conference.

[15]  P. Khargonekar,et al.  State-space solutions to standard H/sub 2/ and H/sub infinity / control problems , 1989 .

[16]  G. Zames Feedback and optimal sensitivity: Model reference transformations, multiplicative seminorms, and approximate inverses , 1981 .

[17]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[18]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[19]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[20]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[21]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[22]  Tingwen Huang,et al.  Reinforcement learning solution for HJB equation arising in constrained optimal control problem , 2015, Neural Networks.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  A. Schaft L/sub 2/-gain analysis of nonlinear systems and nonlinear state-feedback H/sub infinity / control , 1992 .