Optimal control for unknown mean-field discrete-time system based on Q-Learning

Solving the optimal mean-field control problem usually requires complete system information. In this paper, a Q-learning algorithm is discussed to solve the optimal control problem of the unknown mean-field discrete-time stochastic system. First, through the corresponding transformation, we turn the stochastic mean-field control problem into a deterministic problem. Second, the H matrix is obtained through Q-function, and the control strategy relies only on the H matrix. Therefore, solving H matrix is equivalent to solving the mean-field optimal control. The proposed Q-learning method iteratively solves H matrix and gain matrix according to input system state information, without the need for system parameter knowledge. Next, it is proved that the control matrix sequence obtained by Q-learning converge to the optimal control, which shows theoretical feasibility of the Q-learning. Finally, two simulation cases verify the effectiveness of Q-learning algorithm.

[1]  Amirhossein Taghvaei,et al.  A Mean-Field Optimal Control Formulation for Global Optimization , 2019, IEEE Transactions on Automatic Control.

[2]  Jae Young Lee,et al.  Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems , 2012, Autom..

[3]  John B. Moore,et al.  Indefinite Stochastic Linear Quadratic Control and Generalized Differential Riccati Equation , 2002, SIAM J. Control. Optim..

[4]  Donghua Zhou,et al.  Moving horizon estimation with non-uniform sampling under component-based dynamic event-triggered transmission , 2020, Autom..

[5]  Frank L. Lewis,et al.  Continuous-Time Q-Learning for Infinite-Horizon Discounted Cost Linear Quadratic Regulator Problems , 2015, IEEE Transactions on Cybernetics.

[6]  Xinyu Wang,et al.  Modeling and Computation of Mean Field Equilibria in Producers' Game with Emission Permits Trading , 2015, Commun. Nonlinear Sci. Numer. Simul..

[7]  Fuad E. Alsaadi,et al.  Dynamic event-triggered mechanism for H∞ non-fragile state estimation of complex networks under randomly occurring sensor saturations , 2020, Inf. Sci..

[8]  Jia Shang-hui Moore-Penrose Generalized Inverse Matrix and Solution of Linear Equation Group , 2009 .

[9]  Xun Li,et al.  Indefinite Mean-Field Stochastic Linear-Quadratic Optimal Control: From Finite Horizon to Infinite Horizon , 2015, IEEE Transactions on Automatic Control.

[10]  A. Bensoussan,et al.  Mean Field Games and Mean Field Type Control Theory , 2013 .

[11]  E. Ostertag Linear Matrix Inequalities , 2011 .

[12]  Zhongsheng Hou,et al.  Data-driven approximate Q-learning stabilization with optimality error bound analysis , 2019, Autom..

[13]  Adriano Festa,et al.  A Mean Field Games approach for multi-lane traffic management , 2017, 1711.04116.

[14]  Zhong-Ping Jiang,et al.  Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise , 2016, IEEE Transactions on Automatic Control.

[15]  João Yoshiyuki Ishihara,et al.  Optimal Robust Linear Quadratic Regulator for Systems Subject to Uncertainties , 2014, IEEE Transactions on Automatic Control.

[16]  Weihai Zhang,et al.  ℋ- index for continuous-time stochastic systems with Markov jump and multiplicative noise , 2019, Autom..

[17]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[18]  Tingwen Huang,et al.  Model-Free Optimal Tracking Control via Critic-Only Q-Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[19]  J. Yong,et al.  Stochastic Linear Quadratic Optimal Control Problems , 2001 .

[20]  Jingtao Shi,et al.  Leader-follower stochastic differential game with asymmetric information and applications , 2015, Autom..

[21]  Zhen Wu,et al.  A Linear-Quadratic Optimal Control Problem of Forward-Backward Stochastic Differential Equations With Partial Information , 2015, IEEE Transactions on Automatic Control.

[22]  Xun Li,et al.  Discrete time mean-field stochastic linear-quadratic optimal control problems , 2013, Autom..

[23]  Weihai Zhang,et al.  On observability and detectability of continuous-time stochastic Markov jump systems , 2015, J. Syst. Sci. Complex..

[24]  Minyue Fu,et al.  Optimal Stabilization Control for Discrete-Time Mean-Field Stochastic Systems , 2019, IEEE Transactions on Automatic Control.

[25]  Weihai Zhang,et al.  Stochastic linear quadratic optimal control with constraint for discrete-time systems , 2014, Appl. Math. Comput..

[26]  Frank L. Lewis,et al.  Off-Policy Interleaved $Q$ -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Huaguang Zhang,et al.  Infinite-time stochastic linear quadratic optimal control for unknown discrete-time systems using adaptive dynamic programming approach , 2016, Neurocomputing.

[28]  Arman C. Kizilkale,et al.  An integral control formulation of Mean-field game based large scale coordination of loads in smart grids , 2018, Autom..

[29]  Haibo He,et al.  Optimal Control for Unknown Discrete-Time Nonlinear Markov Jump Systems Using Adaptive Dynamic Programming , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Zidong Wang,et al.  Moving horizon estimation with multirate measurements and correlated noises , 2020, International Journal of Robust and Nonlinear Control.

[31]  Huaguang Zhang,et al.  Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm , 2018, Neurocomputing.

[32]  Xi Chen,et al.  Solvability and asymptotic behavior of generalized Riccati equations arising in indefinite stochastic LQ controls , 2001, IEEE Trans. Autom. Control..

[33]  Hua Xiao,et al.  An optimal control problem for mean-field forward-backward stochastic differential equation with noisy observation , 2015, Autom..

[34]  Kyriakos G. Vamvoudakis,et al.  Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach , 2017, Syst. Control. Lett..

[35]  Huaguang Zhang,et al.  Neural-Network-Based Constrained Optimal Control Scheme for Discrete-Time Switched Nonlinear System Using Dual Heuristic Programming , 2014, IEEE Transactions on Automation Science and Engineering.

[36]  X. Zhou,et al.  Indefinite stochastic linear quadratic control and generalized differential Riccati equation , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[37]  Qi Li,et al.  A Dynamic Event-Triggered Approach to Recursive Filtering for Complex Networks With Switching Topologies Subject to Random Sensor Failures , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Iven Mareels,et al.  Reinforcement learning for adaptive optimal control of continuous-time linear periodic systems , 2020, Autom..

[39]  W. Wonham On a Matrix Riccati Equation of Stochastic Control , 1968 .

[40]  Xun Yu Zhou,et al.  Linear matrix inequalities, Riccati equations, and indefinite stochastic linear quadratic controls , 2000, IEEE Trans. Autom. Control..

[41]  Xun Li,et al.  Discrete-time mean-field Stochastic linear-quadratic optimal control problems, II: Infinite horizon case , 2015, Autom..

[42]  Qiuye Sun,et al.  Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence , 2012, Neurocomputing.

[43]  Jun Hu,et al.  Moving Horizon Estimation With Unknown Inputs Under Dynamic Quantization Effects , 2020, IEEE Transactions on Automatic Control.

[44]  Zhanshan Wang,et al.  Adaptive output-feedback optimal control for continuous-time linear systems based on adaptive dynamic programming approach , 2021, Neurocomputing.

[45]  Yan Li,et al.  Linear-quadratic optimal control for unknown mean-field stochastic discrete-time system via adaptive dynamic programming approach , 2017, Neurocomputing.

[46]  Mathieu Lauriere,et al.  Numerical Methods for Mean Field Games and Mean Field Type Control , 2021, Proceedings of Symposia in Applied Mathematics.