论文信息 - Optimal trajectory Output Tracking control with a Q-learning algorithm

Optimal trajectory Output Tracking control with a Q-learning algorithm

In this paper a novel Q-learning algorithm is proposed to solve the Linear Quadratic Output Tracking (LQOT) control problem of a linear time invariant system with completely unknown system and reference dynamics. We first define an action-dependent value function for the LQOT problem after we augment the system and the reference states and pick appropriately the user-defined matrices in the performance index of the augmented state. An integral reinforcement learning approach is used to develop a reinforcement learning structure to estimate the parameters of the Q-function online while also guaranteeing closed-loop stability, trajectory tracking and convergence to the optimal tracking solution. A simulation result of an unknown spring-mass-damper linear system is presented to show the efficacy of the proposed approach.

Kyriakos G. Vamvoudakis | K. Vamvoudakis

[1] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.

[2] Zhong-Ping Jiang,et al. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[3] Derong Liu,et al. Adaptive Dynamic Programming for Control: Algorithms and Stability , 2012 .

[4] Derong Liu,et al. Adaptive Dynamic Programming for Optimal Tracking Control of Unknown Nonlinear Systems With Application to Coal Gasification , 2014, IEEE Transactions on Automation Science and Engineering.

[5] B. Anderson,et al. Optimal control: linear quadratic methods , 1990 .

[6] Frank L. Lewis,et al. $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[7] Frank L. Lewis,et al. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[8] Kyriakos G. Vamvoudakis,et al. Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems , 2015, Autom..

[9] Frank L. Lewis,et al. Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data , 2015, IEEE Transactions on Cybernetics.

[10] Huaguang Zhang,et al. Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adaptive dynamic programming , 2014, Int. J. Control.

[11] F. Lewis,et al. Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[12] Frank L. Lewis,et al. Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13] Frank L. Lewis,et al. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[16] Frank L. Lewis,et al. Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[17] Warren E. Dixon,et al. Approximate optimal trajectory tracking for continuous-time nonlinear systems , 2013, Autom..

[18] Huaguang Zhang,et al. Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems With Time Delays Based on Heuristic Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[19] Sean P. Meyn,et al. Q-learning and Pontryagin's Minimum Principle , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[20] Frank L. Lewis,et al. Autonomy and machine intelligence in complex systems: A tutorial , 2015, 2015 American Control Conference (ACC).

[21] Petros A. Ioannou,et al. Adaptive control tutorial , 2006, Advances in design and control.

[22] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[23] W. Dixon. Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2014 .

[24] Zhong-Ping Jiang,et al. Linear optimal tracking control: An adaptive dynamic programming approach , 2015, 2015 American Control Conference (ACC).

[25] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[26] Derong Liu,et al. Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach , 2012, Neurocomputing.