Model-free H∞ control design for unknown linear discrete-time systems via Q-learning with LMI

This paper develops a model-free H"~ control design algorithm for unknown linear discrete-time systems by using Q-learning, which is a reinforcement learning method based on an actor-critic structure. In model-free design, there is no known dynamical model of the system. Thus, one has no information on the system matrices, but can access the state variables and input variables. The paper derives an iterative solution algorithm for H"~ control design that is based on policy iteration. The algorithm is expressed in the form of linear matrix inequalities (LMI) that do not involve the system matrices, but only require data measured from the system state and input. It is shown that, for sufficiently rich enough disturbance, this algorithm converges to the standard H"~ control solution obtained using the exact system model. Two numerical examples are given to show the effectiveness in obtaining the H"~ control without any using knowledge of the system dynamics matrices, and the examples show that the results converge to the ones obtained with the exact system dynamics matrices.

[1]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[2]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[3]  Rolf Johansson,et al.  System modeling and identification , 1993 .

[4]  Frank L. Lewis,et al.  Aircraft Control and Simulation , 1992 .

[5]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[6]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[7]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[8]  G. Hewer An iterative technique for the computation of the steady state gains for the discrete optimal regulator , 1971 .

[9]  Stephen P. Boyd,et al.  Linear Matrix Inequalities in Systems and Control Theory , 1994 .

[10]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[11]  J. Doyle,et al.  Essentials of Robust Control , 1997 .

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[14]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[15]  George N. Saridis,et al.  An Approximation Theory of Optimal Control for Trainable Manipulators , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[16]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[17]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.