Reinforcement Q-Learning Incorporated With Internal Model Method for Output Feedback Tracking Control of Unknown Linear Systems

This paper investigates the output feedback (OPFB) tracking control problem for discrete-time linear (DTL) systems with unknown dynamics. With the approach of augmented system, the tracking control problem is first turned into a regulation problem with a discounted performance function, the solution of which relies on the Q-function based Bellman equation. Then, a novel value iteration (VI) scheme based on reinforcement Q-learning mechanism is proposed for solving the Q-function Bellman equation without knowing the system dynamics. Moreover, the convergence of the VI based Q-learning is proved by indicating that it converges to the Q-function Bellman equation and it brings out no bias of solution even under the probing noise satisfying the persistent excitation (PE) condition. As a result, the OPFB tracking controller can be learned online by using the past input, output, and reference trajectory data of the augmented system. The proposed scheme removes the requirement of initial admissible policy in the policy iteration (PI) method. Finally, effectiveness of the proposed scheme is demonstrated through a simulation example.