Online optimal and adaptive integral tracking control for varying discrete‐time systems using reinforcement learning

Conventional closed‐form solution to the optimal control problem using optimal control theory is only available under the assumption that there are known system dynamics/models described as differential equations. Without such models, reinforcement learning (RL) as a candidate technique has been successfully applied to iteratively solve the optimal control problem for unknown or varying systems. For the optimal tracking control problem, existing RL techniques in the literature assume either the use of a predetermined feedforward input for the tracking control, restrictive assumptions on the reference model dynamics, or discounted tracking costs. Furthermore, by using discounted tracking costs, zero steady‐state error cannot be guaranteed by the existing RL methods. This article therefore presents an optimal online RL tracking control framework for discrete‐time (DT) systems, which does not impose any restrictive assumptions of the existing methods and equally guarantees zero steady‐state tracking error. This is achieved by augmenting the original system dynamics with the integral of the error between the reference inputs and the tracked outputs for use in the online RL framework. It is further shown that the resulting value function for the DT linear quadratic tracker using the augmented formulation with integral control is also quadratic. This enables the development of Bellman equations, which use only the system measurements to solve the corresponding DT algebraic Riccati equation and obtain the optimal tracking control inputs online. Two RL strategies are thereafter proposed based on both the value function approximation and the Q‐learning along with bounds on excitation for the convergence of the parameter estimates. Simulation case studies show the effectiveness of the proposed approach.

[1]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Derong Liu,et al.  Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach , 2012, Neurocomputing.

[3]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[4]  Qing-Chang Zhong,et al.  Current-Limiting DC/DC Power Converters , 2019, IEEE Transactions on Control Systems Technology.

[5]  Sarangapani Jagannathan,et al.  Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[6]  Frank L. Lewis,et al.  Chapter Five – Optimal Tracking Control of Uncertain Systems: On-Policy and Off-Policy Reinforcement Learning Approaches , 2016 .

[7]  Frank L. Lewis,et al.  Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems , 2014, Autom..

[8]  Hao Xu,et al.  Finite-horizon near optimal adaptive control of uncertain linear discrete-time systems , 2015 .

[9]  Frank L. Lewis,et al.  Optimal Tracking Control of Uncertain Systems , 2016 .

[10]  Frank L. Lewis,et al.  Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[12]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[13]  Sarangapani Jagannathan,et al.  Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Qinglai Wei,et al.  A novel optimal tracking control scheme for a class of discrete-time nonlinear systems using generalised policy iteration adaptive dynamic programming algorithm , 2017, Int. J. Syst. Sci..

[15]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[16]  Jennie Si,et al.  Online learning control by association and reinforcement. , 2001, IEEE transactions on neural networks.

[17]  G. Hewer An iterative technique for the computation of the steady state gains for the discrete optimal regulator , 1971 .

[18]  Dongbin Zhao,et al.  Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics , 2016 .

[19]  Frank L. Lewis,et al.  Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[20]  Yu Guo,et al.  Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics , 2016, Int. J. Control.

[21]  Frank L. Lewis,et al.  Optimal tracking control for linear discrete-time systems using reinforcement learning , 2013, 52nd IEEE Conference on Decision and Control.

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  Paul Trodden,et al.  Learning MPC: System stability and convergent identification under bounded modelling error , 2018, 2018 Australian & New Zealand Control Conference (ANZCC).

[24]  Haibo He,et al.  Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming , 2017, IEEE Transactions on Industrial Electronics.

[25]  Zdzislaw Kowalczuk,et al.  Autonomous Driver Based on an Intelligent System of Decision-Making , 2015, Cognitive Computation.

[26]  Yong He,et al.  Optimal control in microgrid using multi-agent reinforcement learning. , 2012, ISA transactions.

[27]  Robert F. Stengel,et al.  An adaptive critic global controller , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[28]  S. Preitl,et al.  Design and Experiments for a Class of Fuzzy Controlled Servo Systems , 2008, IEEE/ASME Transactions on Mechatronics.

[29]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[30]  Antonio T. Alexandridis,et al.  Nonlinear stability analysis for ac/dc voltage source converters driven by PI current-mode controllers , 2014, 2014 European Control Conference (ECC).

[31]  Frank L. Lewis,et al.  Optimal and Autonomous Control Using Reinforcement Learning: A Survey , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Frank L. Lewis,et al.  Reinforcement learning and optimal adaptive control: An overview and implementation examples , 2012, Annu. Rev. Control..

[33]  Kyriakos G. Vamvoudakis,et al.  Optimal adaptive control for unknown systems using output feedback by reinforcement learning methods , 2010, IEEE ICCA 2010.

[34]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[35]  Frank L. Lewis,et al.  Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.