Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning

In this technical note, an online learning algorithm is developed to solve the linear quadratic tracking (LQT) problem for partially-unknown continuous-time systems. It is shown that the value function is quadratic in terms of the state of the system and the command generator. Based on this quadratic form, an LQT Bellman equation and an LQT algebraic Riccati equation (ARE) are derived to solve the LQT problem. The integral reinforcement learning technique is used to find the solution to the LQT ARE online and without requiring the knowledge of the system drift dynamics or the command generator dynamics. The convergence of the proposed online algorithm to the optimal control solution is verified. To show the efficiency of the proposed approach, a simulation example is provided.

[1]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[2]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[3]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[4]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[5]  Tomas Landelius,et al.  Reinforcement Learning and Distributed Local Model Synthesis , 1997 .

[6]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[7]  E. Barbieri,et al.  On the infinite-horizon LQ tracker , 2000 .

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Jacob Engwerda,et al.  LQ Dynamic Optimization and Differential Games , 2005 .

[10]  Frank L. Lewis,et al.  Policy Iterations on the Hamilton–Jacobi–Isaacs Equation for $H_{\infty}$ State Feedback Control With Input Saturation , 2006, IEEE Transactions on Automatic Control.

[11]  Enrique Barbieri,et al.  Real-time Infinite Horizon Linear-Quadratic Tracking Controller for Vibration Quenching in Flexible Beams , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[12]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[13]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[15]  Sarangapani Jagannathan,et al.  Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[16]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[17]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[18]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[19]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Sivasubramanya N. Balakrishnan,et al.  Optimal Tracking Control of Motion Systems , 2012, IEEE Transactions on Control Systems Technology.

[21]  Jae Young Lee,et al.  Integral reinforcement learning with explorations for continuous-time nonlinear systems , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[22]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[23]  Frank L. Lewis,et al.  Optimal Control: Lewis/Optimal Control 3e , 2012 .

[24]  Jae Young Lee,et al.  Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems , 2012, Autom..

[25]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[26]  Derong Liu,et al.  Optimal Tracking Control Scheme for Discrete-Time Nonlinear Systems with Approximation Errors , 2013, ISNN.

[27]  Qinglai Wei,et al.  Optimal Tracking Control for a Class of Nonlinear Time-Delay Systems with Actuator Saturation , 2013, BICS.

[28]  Kathleen M. Jagodnik,et al.  Reinforcement Learning and Feedback Control for High-Level Upper-Extremity Neuroprostheses , 2014 .

[29]  Derong Liu,et al.  Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm , 2014, Neurocomputing.