Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem

Approximate dynamic programming (ADP) and reinforcement learning (RL) have emerged as important tools in the design of optimal and adaptive control systems. Most of the existing RL and ADP methods make use of full-state feedback, a requirement that is often difficult to satisfy in practical applications. As a result, output feedback methods are more desirable as they relax this requirement. In this paper, we present a new output feedback-based Q-learning approach to solving the linear quadratic regulation (LQR) control problem for discrete-time systems. The proposed scheme is completely online in nature and works without requiring the system dynamics information. More specifically, a new representation of the LQR Q-function is developed in terms of the input–output data. Based on this new Q-function representation, output feedback LQR controllers are designed. We present two output feedback iterative Q-learning algorithms based on the policy iteration and the value iteration methods. This scheme has the advantage that it does not incur any excitation noise bias, and therefore, the need of using discounted cost functions is circumvented, which in turn ensures closed-loop stability. It is shown that the proposed algorithms converge to the solution of the LQR Riccati equation. A comprehensive simulation study is carried out, which illustrates the proposed scheme.

[1]  Frank L. Lewis,et al.  Reinforcement learning and optimal adaptive control: An overview and implementation examples , 2012, Annu. Rev. Control..

[2]  Frank L. Lewis,et al.  Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data , 2015, IEEE Transactions on Cybernetics.

[3]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[5]  Luigi Fortuna,et al.  Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control , 2009 .

[6]  Tomas Landelius,et al.  Reinforcement Learning and Distributed Local Model Synthesis , 1997 .

[7]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[8]  Haibo He,et al.  Improved Sliding Mode Design for Load Frequency Control of Power System Integrated an Adaptive Learning Strategy , 2017, IEEE Transactions on Industrial Electronics.

[9]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[10]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[11]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[12]  S. Jagannathan,et al.  Near optimal output feedback control of nonlinear discrete-time systems based on reinforcement neural network learning , 2014, IEEE/CAA Journal of Automatica Sinica.

[13]  Zongli Lin,et al.  An LMI approach to the control of exponentially unstable systems with input time delay , 2013, 52nd IEEE Conference on Decision and Control.

[14]  Zongli Lin,et al.  Output feedback reinforcement Q-learning control for the discrete-time linear quadratic regulator problem , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[15]  Xiangnan Zhong,et al.  An Event-Triggered ADP Control Approach for Continuous-Time System With Unknown Internal States. , 2017 .

[16]  Gang Tao,et al.  Adaptive Control Design and Analysis , 2003 .

[17]  Romain Postoyan,et al.  Stability Analysis of Discrete-Time Infinite-Horizon Optimal Control With Discounted Cost , 2017, IEEE Transactions on Automatic Control.

[18]  Derong Liu,et al.  Adaptive Critic Learning Techniques for Engine Torque and Air–Fuel Ratio Control , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Leiba Rodman,et al.  Algebraic Riccati equations , 1995 .

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Zongli Lin,et al.  Output feedback Q-learning for discrete-time linear zero-sum games with application to the H-infinity control , 2018, Autom..

[22]  Frank L. Lewis,et al.  Optimal Output-Feedback Control of Unknown Continuous-Time Linear Systems Using Off-policy Reinforcement Learning , 2016, IEEE Transactions on Cybernetics.

[23]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[24]  G. Hewer An iterative technique for the computation of the steady state gains for the discrete optimal regulator , 1971 .

[25]  Frank L. Lewis,et al.  Adaptive Suboptimal Output-Feedback Control for Linear Systems Using Integral Reinforcement Learning , 2015, IEEE Transactions on Control Systems Technology.

[26]  Haibo He,et al.  Novel iterative neural dynamic programming for data-based approximate optimal control design , 2017, Autom..

[27]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[28]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[29]  Derong Liu,et al.  Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic programming , 2013, Int. J. Control.

[30]  Zhong-Ping Jiang,et al.  Data-driven adaptive optimal output-feedback control of a 2-DOF helicopter , 2016, 2016 American Control Conference (ACC).

[31]  Petros A. Ioannou,et al.  Adaptive control tutorial , 2006, Advances in design and control.

[32]  Haibo He,et al.  Air-Breathing Hypersonic Vehicle Tracking Control Based on Adaptive Dynamic Programming , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[34]  Victor M. Becerra,et al.  Optimal control , 2008, Scholarpedia.

[35]  Haibo He,et al.  Adaptive Event-Triggered Control Based on Heuristic Dynamic Programming for Nonlinear Discrete-Time Systems , 2017, IEEE Transactions on Neural Networks and Learning Systems.