Model-Free Optimal Tracking Control via Critic-Only Q-Learning

Model-free control is an important and promising topic in control fields, which has attracted extensive attention in the past few years. In this paper, we aim to solve the model-free optimal tracking control problem of nonaffine nonlinear discrete-time systems. A critic-only Q-learning (CoQL) method is developed, which learns the optimal tracking control from real system data, and thus avoids solving the tracking Hamilton-Jacobi-Bellman equation. First, the Q-learning algorithm is proposed based on the augmented system, and its convergence is established. Using only one neural network for approximating the Q-function, the CoQL method is developed to implement the Q-learning algorithm. Furthermore, the convergence of the CoQL method is proved with the consideration of neural network approximation error. With the convergent Q-function obtained from the CoQL method, the adaptive optimal tracking control is designed based on the gradient descent scheme. Finally, the effectiveness of the developed CoQL method is demonstrated through simulation studies. The developed CoQL method learns with off-policy data and implements with a critic-only structure, thus it is easy to realize and overcome the inadequate exploration problem.

[1]  Zhongke Shi,et al.  Reinforcement Learning Output Feedback NN Control Using Deterministic Learning Technique , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Yu Jiang,et al.  Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[4]  Frank L. Lewis,et al.  Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data , 2015, IEEE Transactions on Cybernetics.

[5]  Han-Xiong Li,et al.  Data-based Suboptimal Neuro-control Design with Reinforcement Learning for Dissipative Spatially Distributed Processes , 2014 .

[6]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[7]  Haibo He,et al.  Optimal Control for Unknown Discrete-Time Nonlinear Markov Jump Systems Using Adaptive Dynamic Programming , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Girish Chowdhary,et al.  Off-policy reinforcement learning with Gaussian processes , 2014, IEEE/CAA Journal of Automatica Sinica.

[9]  Sarangapani Jagannathan,et al.  Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Tingwen Huang,et al.  Reinforcement learning solution for HJB equation arising in constrained optimal control problem , 2015, Neural Networks.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Frank L. Lewis,et al.  Single Network Adaptive Critics Networks¿¿-¿¿Development, Analysis, and Applications , 2013 .

[13]  Derong Liu,et al.  Infinite Horizon Self-Learning Optimal Control of Nonaffine Discrete-Time Nonlinear Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Derong Liu,et al.  Numerical adaptive learning control scheme for discrete-time non-linear systems , 2013 .

[15]  Chenguang Yang,et al.  Global Neural Dynamic Surface Tracking Control of Strict-Feedback Systems With Application to Hypersonic Flight Vehicle , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Keck Voon Ling,et al.  Inverse optimal adaptive control for attitude tracking of spacecraft , 2005, IEEE Trans. Autom. Control..

[17]  Derong Liu,et al.  Adaptive Dynamic Programming for Optimal Tracking Control of Unknown Nonlinear Systems With Application to Coal Gasification , 2014, IEEE Transactions on Automation Science and Engineering.

[18]  Tingwen Huang,et al.  Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[19]  J. Na,et al.  Online adaptive approximate optimal tracking control with simplified dual approximation structure for continuous-time unknown nonlinear systems , 2014, IEEE/CAA Journal of Automatica Sinica.

[20]  Frank L. Lewis,et al.  Model-free H∞ control design for unknown linear discrete-time systems via Q-learning with LMI , 2010, Autom..

[21]  Derong Liu,et al.  A Novel Dual Iterative $Q$-Learning Method for Optimal Battery Management in Smart Residential Environments , 2015, IEEE Transactions on Industrial Electronics.

[22]  Radhakant Padhi,et al.  A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems , 2006, Neural Networks.

[23]  Shalabh Bhatnagar,et al.  New algorithms of the Q-learning type , 2008, Autom..

[24]  Frank L. Lewis,et al.  Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[25]  Pedro Ferreira,et al.  An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[26]  Shaocheng Tong,et al.  Reinforcement Learning Design-Based Adaptive Tracking Control With Less Learning Parameters for Nonlinear Discrete-Time MIMO Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[28]  Zhongke Shi,et al.  An overview on flight dynamics and control approaches for hypersonic vehicles , 2015, Science China Information Sciences.

[29]  Huaguang Zhang,et al.  Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adaptive dynamic programming , 2014, Int. J. Control.

[30]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[31]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[32]  Han-Xiong Li,et al.  Adaptive Optimal Control of Highly Dissipative Nonlinear Spatially Distributed Processes With Neuro-Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Shalabh Bhatnagar,et al.  Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.

[34]  Haibo He,et al.  Online Learning Control Using Adaptive Critic Designs With Sparse Kernel Machines , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Huaguang Zhang,et al.  Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems With Time Delays Based on Heuristic Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[36]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[37]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[38]  Ali Heydari,et al.  Finite-Horizon Control-Constrained Nonlinear Optimal Control Using Single Network Adaptive Critics , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Sivasubramanya N. Balakrishnan,et al.  Optimal Tracking Control of Motion Systems , 2012, IEEE Transactions on Control Systems Technology.

[40]  Dimitri P. Bertsekas,et al.  Temporal Difference Methods for General Projected Equations , 2011, IEEE Transactions on Automatic Control.

[41]  Jae Young Lee,et al.  Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[43]  Jinyu Wen,et al.  Adaptive Learning in Tracking Control Based on the Dual Critic Network Design , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Derong Liu,et al.  Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Cybernetics.

[45]  Yishay Mansour,et al.  Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[46]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[47]  Richard S. Sutton,et al.  GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.

[48]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[49]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[50]  Haibo He,et al.  Model-Free Dual Heuristic Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[51]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[52]  Ali Heydari,et al.  Optimal Switching and Control of Nonlinear Switching Systems Using Approximate Dynamic Programming , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[53]  Frank L. Lewis,et al.  Continuous-Time Q-Learning for Infinite-Horizon Discounted Cost Linear Quadratic Regulator Problems , 2015, IEEE Transactions on Cybernetics.

[54]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[55]  Hao Xu,et al.  Neural Network-Based Finite-Horizon Optimal Control of Uncertain Affine Nonlinear Discrete-Time Systems , 2015, IEEE Trans. Neural Networks Learn. Syst..

[56]  Jae Young Lee,et al.  Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems , 2012, Autom..

[57]  R. Sutton,et al.  GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .

[58]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[59]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[60]  Tingwen Huang,et al.  Data-Driven $H_\infty$ Control for Nonlinear Distributed Parameter Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[61]  Derong Liu,et al.  Data-Driven Neuro-Optimal Temperature Control of Water–Gas Shift Reaction Using Stable Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Industrial Electronics.

[62]  Frank L. Lewis,et al.  Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[63]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[64]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[65]  Frank L. Lewis,et al.  Optimal control of nonlinear discrete time-varying systems using a new neural network approximation structure , 2015, Neurocomputing.

[66]  Shuzhi Sam Ge,et al.  Stable adaptive control and estimation for nonlinear systems - neural and fuzzy approximator techniques: J. T. Spooner, M. Maggiore, R. Ordóñez, K. M. Passino, John Wiley & Sons, Inc., New York, 2002, ISBN: 0-471-41546-4 , 2003, Autom..

[67]  Derong Liu,et al.  Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm , 2013, Neurocomputing.

[68]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[69]  Haibo He,et al.  Power System Stability Control for a Wind Farm Based on Adaptive Dynamic Programming , 2015, IEEE Transactions on Smart Grid.

[70]  Derong Liu,et al.  Neural-network-based adaptive optimal tracking control scheme for discrete-time nonlinear systems with approximation errors , 2013, Neurocomputing.

[71]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[72]  Frank L. Lewis,et al.  Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[73]  Kevin M. Passino,et al.  Stable Adaptive Control and Estimation for Nonlinear Systems , 2001 .

[74]  Yuanqing Xia,et al.  Adaptive attitude tracking control for rigid spacecraft with finite-time convergence , 2013, Autom..

[75]  D. Bertsekas Approximate policy iteration: a survey and some new methods , 2011 .

[76]  Warren E. Dixon,et al.  Concurrent learning-based approximate feedback-Nash equilibrium solution of N-player nonzero-sum differential games , 2013, IEEE/CAA Journal of Automatica Sinica.

[77]  Warren E. Dixon,et al.  Lyapunov-Based Exponential Tracking Control of a Hypersonic Aircraft with Aerothermoelastic Effects , 2010 .

[78]  Sanjoy Dasgupta,et al.  Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[79]  Qinglai Wei,et al.  A Novel Iterative $\theta $-Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Automation Science and Engineering.

[80]  Zhi-Hong Guan,et al.  Optimal tracking and two-channel disturbance rejection under control energy constraint , 2011, Autom..

[81]  Warren E. Dixon,et al.  Approximate optimal trajectory tracking for continuous-time nonlinear systems , 2013, Autom..

[82]  Huai-Ning Wu,et al.  Neural Network Based Online Simultaneous Policy Update Algorithm for Solving the HJI Equation in Nonlinear $H_{\infty}$ Control , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[83]  Huai-Ning Wu,et al.  Approximate Optimal Control Design for Nonlinear One-Dimensional Parabolic PDE Systems Using Empirical Eigenfunctions and Neural Network , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[84]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[85]  Haibo He,et al.  Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming , 2011, IEEE Transactions on Neural Networks.