Nearly data-based optimal control for linear discrete model-free systems with delays via reinforcement learning

In this paper, a nearly data-based optimal control scheme is proposed for linear discrete model-free systems with delays. The nearly optimal control can be obtained using only measured input/output data from systems, by reinforcement learning technology, which combines Q-learning with value iterative algorithm. First, we construct a state estimator by using the measured input/output data. Second, the quadratic functional is used to approximate the value function at each point in the state space, and the data-based control is designed by Q-learning method using the obtained state estimator. Then, the paper states the method, that is, how to solve the optimal inner kernel matrix in the least-square sense, by value iteration algorithm. Finally, the numerical examples are given to illustrate the effectiveness of our approach.

[1]  Huaguang Zhang,et al.  An Optimal Control Scheme for a Class of Discrete-time Nonlinear Systems with Time Delays Using Adap , 2010 .

[2]  K. Furuta,et al.  Dynamic compensator design for discrete-time LQG problem using Markov parameters , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[3]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[4]  Wim Michiels,et al.  Delay effects on the asymptotic stability of various fluid models in high-performance networks , 2004 .

[5]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[6]  Xi-Ming Sun,et al.  Asynchronous H∞ control of switched delay systems with average dwell time , 2012, J. Frankl. Inst..

[7]  Jun Zhao,et al.  Stabilization of a Class of Switched Stochastic Systems with Time Delays Under Asynchronous Switching , 2012, Circuits, Systems, and Signal Processing.

[8]  Q. Henry Wu,et al.  Optimization of control parameters in genetic algorithms: a stochastic approach , 1999, Int. J. Syst. Sci..

[9]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[10]  Huaguang Zhang,et al.  Nearly Optimal Control Scheme Using Adaptive Dynamic Programming Based on Generalized Fuzzy Hyperbolic Model , 2013 .

[11]  Sarangapani Jagannathan,et al.  Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[13]  Yanhong Luo,et al.  Nearly Optimal Control Scheme Using Adaptive Dynamic Programming Based on Generalized Fuzzy Hyperbolic Model: Nearly Optimal Control Scheme Using Adaptive Dynamic Programming Based on Generalized Fuzzy Hyperbolic Model , 2014 .

[14]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[15]  Derong Liu,et al.  An Optimal Control Scheme for a Class of Discrete-time Nonlinear Systems with Time Delays Using Adaptive Dynamic Programming , 2010 .

[16]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[17]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[18]  Shenquan Wang,et al.  Robust fault detection filter design for a class of time-delay systems via equivalent transformation , 2013 .

[19]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[20]  Donald E. Kirk,et al.  Optimal control theory : an introduction , 1970 .

[21]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Springer. Niculescu,et al.  Delay effects on stability , 2001 .

[23]  Richard W. Longman,et al.  State Estimation with ARMarkov Models , 1998 .

[24]  Joe Brewer,et al.  Kronecker products and matrix calculus in system theory , 1978 .

[25]  Yan Lin,et al.  Adaptive control for a class of nonlinear time-delay systems preceded by unknown hysteresis , 2013, Int. J. Syst. Sci..

[26]  Michael V. Basin,et al.  Optimal control for linear systems with multiple time delays in control input , 2006, IEEE Transactions on Automatic Control.

[27]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[28]  R. Skelton,et al.  Markov Data-Based LQG Control , 2000 .

[29]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[30]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[31]  Martín Velasco-Villa,et al.  Observability and observers for nonlinear systems with time delays , 2000, Kybernetika.

[32]  Derong Liu,et al.  An iterative ϵ-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state , 2012, Neural Networks.

[33]  Kolmanovskii,et al.  Introduction to the Theory and Applications of Functional Differential Equations , 1999 .

[34]  Derong Liu,et al.  Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach , 2012, Neurocomputing.

[35]  Hui Zhang,et al.  Robust ℋ︁∞ PID control for multivariable networked control systems with disturbance/noise attenuation , 2012 .

[36]  Keith J. Burnham,et al.  An almost optimal control design method for nonlinear time-delay systems , 2012, Int. J. Control.

[37]  Huaguang Zhang,et al.  Optimal control laws for time-delay systems with saturating actuators based on heuristic dynamic programming , 2010, Neurocomputing.

[38]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[39]  Huaguang Zhang,et al.  Delay-dependent resilient-robust stabilisation of uncertain networked control systems with variable sampling intervals , 2014, Int. J. Syst. Sci..

[40]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[41]  Hui Zhang,et al.  H∞ Step Tracking Control for Networked Discrete-Time Nonlinear Systems With Integral and Predictive Actions , 2013, IEEE Transactions on Industrial Informatics.

[42]  Jean-Pierre Richard,et al.  Time-delay systems: an overview of some recent advances and open problems , 2003, Autom..

[43]  Qinglai Wei,et al.  Dual iterative adaptive dynamic programming for a class of discrete-time nonlinear systems with time-delays , 2012, Neural Computing and Applications.

[44]  Huaguang Zhang,et al.  Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems With Time Delays Based on Heuristic Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[45]  Hui Zhang,et al.  Robust Static Output Feedback Control and Remote PID Design for Networked Motor Systems , 2011, IEEE Transactions on Industrial Electronics.

[46]  Frank L. Lewis,et al.  Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[47]  Richard C. Dorf,et al.  Modern Control Systems, 6th Ed. , 1991 .

[48]  Huaguang Zhang,et al.  Infinite horizon optimal control of affine nonlinear discrete switched systems using two-stage approximate dynamic programming , 2012, Int. J. Syst. Sci..