Off-Policy Reinforcement Learning for Optimal Preview Tracking Control of Linear Discrete-Time systems with unknown dynamics

In this paper., an off-policy reinforcement learning (RL) algorithm is presented to solve the optimal preview tracking control of discrete time systems with unknown dynamics. Firstly., an augmented state-space system that includes the available preview knowledge as a part of the state vector is constructed to cast the preview tracking control problem as a standard linear quadratic regulator (LQR) one. Secondly., the reinforcement learning technique is utilized to solve the algebraic Riccati equation (ARE) using online measurable data without requiring the a priori knowledge of the system matrices. Compared with the existing off-policy RL algorithm., the proposed scheme solves a preview tracking control problem. A numerical simulation example is given to verify the effectiveness of the proposed control scheme.

[1]  B. Anderson,et al.  Optimal control: linear quadratic methods , 1990 .

[2]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[3]  M. Tomizuka Optimal continuous finite preview problem , 1975 .

[4]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[5]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[6]  T. Katayama,et al.  Design of an optimal controller for a discrete-time system subject to previewable demand , 1985 .

[7]  D. Kleinman Stabilizing a discrete, constant, linear system with application to iterative methods for solving the Riccati equation , 1974 .

[8]  C. D. Souza,et al.  Continuous-time tracking problems in an H∞ setting: a game theory approach , 1995, IEEE Trans. Autom. Control..

[9]  Frank L. Lewis,et al.  H∞ control of linear discrete-time systems: Off-policy reinforcement learning , 2017, Autom..

[10]  Frank L. Lewis,et al.  Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[11]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[12]  Frank L. Lewis,et al.  Adaptive optimal control algorithm for continuous-time nonlinear systems based on policy iteration , 2008, 2008 47th IEEE Conference on Decision and Control.

[13]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[16]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  Masayoshi Tomizuka,et al.  The optimal finite preview problem and its application to man-machine systems. , 1974 .