论文信息 - Optimal control for unknown mean-field discrete-time system based on Q-Learning

Optimal control for unknown mean-field discrete-time system based on Q-Learning

Solving the optimal mean-field control problem usually requires complete system information. In this paper, a Q-learning algorithm is discussed to solve the optimal control problem of the unknown mean-field discrete-time stochastic system. First, through the corresponding transformation, we turn the stochastic mean-field control problem into a deterministic problem. Second, the H matrix is obtained through Q-function, and the control strategy relies only on the H matrix. Therefore, solving H matrix is equivalent to solving the mean-field optimal control. The proposed Q-learning method iteratively solves H matrix and gain matrix according to input system state information, without the need for system parameter knowledge. Next, it is proved that the control matrix sequence obtained by Q-learning converge to the optimal control, which shows theoretical feasibility of the Q-learning. Finally, two simulation cases verify the effectiveness of Q-learning algorithm.

[1] Amirhossein Taghvaei,et al. A Mean-Field Optimal Control Formulation for Global Optimization , 2019, IEEE Transactions on Automatic Control.

[2] Jae Young Lee,et al. Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems , 2012, Autom..

[3] John B. Moore,et al. Indefinite Stochastic Linear Quadratic Control and Generalized Differential Riccati Equation , 2002, SIAM J. Control. Optim..

[4] Donghua Zhou,et al. Moving horizon estimation with non-uniform sampling under component-based dynamic event-triggered transmission , 2020, Autom..

[5] Frank L. Lewis,et al. Continuous-Time Q-Learning for Infinite-Horizon Discounted Cost Linear Quadratic Regulator Problems , 2015, IEEE Transactions on Cybernetics.

[6] Xinyu Wang,et al. Modeling and Computation of Mean Field Equilibria in Producers' Game with Emission Permits Trading , 2015, Commun. Nonlinear Sci. Numer. Simul..

[7] Fuad E. Alsaadi,et al. Dynamic event-triggered mechanism for H∞ non-fragile state estimation of complex networks under randomly occurring sensor saturations , 2020, Inf. Sci..

[8] Jia Shang-hui. Moore-Penrose Generalized Inverse Matrix and Solution of Linear Equation Group , 2009 .

[9] Xun Li,et al. Indefinite Mean-Field Stochastic Linear-Quadratic Optimal Control: From Finite Horizon to Infinite Horizon , 2015, IEEE Transactions on Automatic Control.

[10] A. Bensoussan,et al. Mean Field Games and Mean Field Type Control Theory , 2013 .

[11] E. Ostertag. Linear Matrix Inequalities , 2011 .

[12] Zhongsheng Hou,et al. Data-driven approximate Q-learning stabilization with optimality error bound analysis , 2019, Autom..

[13] Adriano Festa,et al. A Mean Field Games approach for multi-lane traffic management , 2017, 1711.04116.

[14] Zhong-Ping Jiang,et al. Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise , 2016, IEEE Transactions on Automatic Control.

[15] João Yoshiyuki Ishihara,et al. Optimal Robust Linear Quadratic Regulator for Systems Subject to Uncertainties , 2014, IEEE Transactions on Automatic Control.

[16] Weihai Zhang,et al. ℋ- index for continuous-time stochastic systems with Markov jump and multiplicative noise , 2019, Autom..

[17] F. Lewis,et al. Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[18] Tingwen Huang,et al. Model-Free Optimal Tracking Control via Critic-Only Q-Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[19] J. Yong,et al. Stochastic Linear Quadratic Optimal Control Problems , 2001 .

[20] Jingtao Shi,et al. Leader-follower stochastic differential game with asymmetric information and applications , 2015, Autom..

[21] Zhen Wu,et al. A Linear-Quadratic Optimal Control Problem of Forward-Backward Stochastic Differential Equations With Partial Information , 2015, IEEE Transactions on Automatic Control.

[22] Xun Li,et al. Discrete time mean-field stochastic linear-quadratic optimal control problems , 2013, Autom..

[23] Weihai Zhang,et al. On observability and detectability of continuous-time stochastic Markov jump systems , 2015, J. Syst. Sci. Complex..

[24] Minyue Fu,et al. Optimal Stabilization Control for Discrete-Time Mean-Field Stochastic Systems , 2019, IEEE Transactions on Automatic Control.

[25] Weihai Zhang,et al. Stochastic linear quadratic optimal control with constraint for discrete-time systems , 2014, Appl. Math. Comput..

[26] Frank L. Lewis,et al. Off-Policy Interleaved $Q$ -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[27] Huaguang Zhang,et al. Infinite-time stochastic linear quadratic optimal control for unknown discrete-time systems using adaptive dynamic programming approach , 2016, Neurocomputing.

[28] Arman C. Kizilkale,et al. An integral control formulation of Mean-field game based large scale coordination of loads in smart grids , 2018, Autom..

[29] Haibo He,et al. Optimal Control for Unknown Discrete-Time Nonlinear Markov Jump Systems Using Adaptive Dynamic Programming , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[30] Zidong Wang,et al. Moving horizon estimation with multirate measurements and correlated noises , 2020, International Journal of Robust and Nonlinear Control.

[31] Huaguang Zhang,et al. Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm , 2018, Neurocomputing.

[32] Xi Chen,et al. Solvability and asymptotic behavior of generalized Riccati equations arising in indefinite stochastic LQ controls , 2001, IEEE Trans. Autom. Control..

[33] Hua Xiao,et al. An optimal control problem for mean-field forward-backward stochastic differential equation with noisy observation , 2015, Autom..

[34] Kyriakos G. Vamvoudakis,et al. Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach , 2017, Syst. Control. Lett..

[35] Huaguang Zhang,et al. Neural-Network-Based Constrained Optimal Control Scheme for Discrete-Time Switched Nonlinear System Using Dual Heuristic Programming , 2014, IEEE Transactions on Automation Science and Engineering.

[36] X. Zhou,et al. Indefinite stochastic linear quadratic control and generalized differential Riccati equation , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[37] Qi Li,et al. A Dynamic Event-Triggered Approach to Recursive Filtering for Complex Networks With Switching Topologies Subject to Random Sensor Failures , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[38] Iven Mareels,et al. Reinforcement learning for adaptive optimal control of continuous-time linear periodic systems , 2020, Autom..

[39] W. Wonham. On a Matrix Riccati Equation of Stochastic Control , 1968 .

[40] Xun Yu Zhou,et al. Linear matrix inequalities, Riccati equations, and indefinite stochastic linear quadratic controls , 2000, IEEE Trans. Autom. Control..

[41] Xun Li,et al. Discrete-time mean-field Stochastic linear-quadratic optimal control problems, II: Infinite horizon case , 2015, Autom..

[42] Qiuye Sun,et al. Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence , 2012, Neurocomputing.

[43] Jun Hu,et al. Moving Horizon Estimation With Unknown Inputs Under Dynamic Quantization Effects , 2020, IEEE Transactions on Automatic Control.

[44] Zhanshan Wang,et al. Adaptive output-feedback optimal control for continuous-time linear systems based on adaptive dynamic programming approach , 2021, Neurocomputing.

[45] Yan Li,et al. Linear-quadratic optimal control for unknown mean-field stochastic discrete-time system via adaptive dynamic programming approach , 2017, Neurocomputing.

[46] Mathieu Lauriere,et al. Numerical Methods for Mean Field Games and Mean Field Type Control , 2021, Proceedings of Symposia in Applied Mathematics.