Infinite-time stochastic linear quadratic optimal control for unknown discrete-time systems using adaptive dynamic programming approach

In this paper, an adaptive dynamic programming (ADP) algorithm based on value iteration (VI) is proposed to solve the infinite-time stochastic linear quadratic (SLQ) optimal control problem for the linear discrete-time systems with completely unknown system dynamics. Firstly, the SLQ control problem is converted into the deterministic problem through system transformation and then an iterative ADP algorithm is introduced to solve the optimal control problem with convergence analysis. Secondly, for the implementation of the iteration algorithm, a neural network (NN) is used to identify the unknown system and then the other two NNs are employed to approximate the cost function and the control gain matrix. Lastly, the effectiveness of the iterative ADP approach is illustrated by two simulation examples.

[1]  David D. Yao,et al.  Stochastic Linear-Quadratic Control via Semidefinite Programming , 2001, SIAM J. Control. Optim..

[2]  Yanhong Luo,et al.  Approximate optimal control for a class of nonlinear discrete-time systems with saturating actuators , 2008 .

[3]  Xun Yu Zhou,et al.  Stochastic Linear Quadratic Regulators with Indefinite Control Weight Costs. II , 2000, SIAM J. Control. Optim..

[4]  X. Zhou,et al.  Continuous-Time Mean-Variance Portfolio Selection: A Stochastic LQ Framework , 2000 .

[5]  Al-TamimiA.,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming , 2008 .

[6]  Weihai Zhang,et al.  Stochastic linear quadratic optimal control with constraint for discrete-time systems , 2014, Appl. Math. Comput..

[7]  Derong Liu,et al.  An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs , 2013, Inf. Sci..

[8]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[9]  Huaguang Zhang,et al.  Neural-Network-Based Constrained Optimal Control Scheme for Discrete-Time Switched Nonlinear System Using Dual Heuristic Programming , 2014, IEEE Transactions on Automation Science and Engineering.

[10]  Huaguang Zhang,et al.  A Comprehensive Review of Stability Analysis of Continuous-Time Recurrent Neural Networks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Dong Yue,et al.  Further Studies on Control Synthesis of Discrete-Time T-S Fuzzy Systems via Augmented Multi-Indexed Matrix Approach , 2014, IEEE Transactions on Cybernetics.

[12]  W. Wonham On a Matrix Riccati Equation of Stochastic Control , 1968 .

[13]  Xun Yu Zhou,et al.  Linear matrix inequalities, Riccati equations, and indefinite stochastic linear quadratic controls , 2000, IEEE Trans. Autom. Control..

[14]  Wang Wei-xing Continuous-time Mean-variance Portfolio Selection , 2010 .

[15]  Mark H. A. Davis Linear estimation and stochastic control , 1977 .

[16]  Dong Yue,et al.  Control Synthesis of Discrete-Time T–S Fuzzy Systems via a Multi-Instant Homogenous Polynomial Approach , 2016, IEEE Transactions on Cybernetics.

[17]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[18]  J. Yong,et al.  Stochastic Linear Quadratic Optimal Control Problems , 2001 .

[19]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[20]  박영진,et al.  Linear Matrix Inequalities (LMIs)를 이용한 강인한 LQR/LQG 제어기의 설계 , 1996 .

[21]  Yongduan Song,et al.  A novel approach to output feedback control of fuzzy stochastic systems , 2014, Autom..

[22]  Roger Fletcher,et al.  A Rapidly Convergent Descent Method for Minimization , 1963, Comput. J..

[23]  A. Bensoussan Lectures on stochastic control , 1982 .

[24]  Ligang Wu,et al.  Induced l2 filtering of fuzzy stochastic systems with time-varying delays , 2013, IEEE Transactions on Cybernetics.

[25]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[26]  Huaguang Zhang,et al.  Asymptotic tracking control scheme for mechanical systems with external disturbances and friction , 2010, Neurocomputing.

[27]  S. Peng,et al.  Fully Coupled Forward-Backward Stochastic Differential Equations and Applications to Optimal Control , 1999 .

[28]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[29]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  John B. Moore,et al.  Indefinite Stochastic Linear Quadratic Control and Generalized Differential Riccati Equation , 2002, SIAM J. Control. Optim..

[31]  Derong Liu,et al.  Adaptive Dynamic Programming for Control: Algorithms and Stability , 2012 .