Pareto optimal control of the mean-field stochastic systems by adaptive dynamic programming algorithm.

The Pareto game for the model-free continuous-time stochastic system is studied through approximate/adaptive dynamic programming (ADP) in this paper. Firstly, the model-based online iterative algorithm is proposed, and it is proved that the control iterative sequence converges to the Pareto efficient solution, but the algorithm requires complete system parameters. Then, we derive the model-free iterative equation and develop the ADP algorithm to calculate the equation by collecting updated states and input information online. From the derivation of the ADP algorithm, the model-free iterative equation and the model-based iterative equation have the same solution, which means that the ADP algorithm can approximate the Pareto optimal solution. Next, the convergence analysis shows that the Pareto optimal strategy is uniquely determined by the ADP algorithm. Finally, two simulation examples confirm the feasibility of the ADP algorithm.

[1]  Yuan-Hua Ni,et al.  Linear-Quadratic Control of Discrete-Time Stochastic Systems with Indefinite Weight Matrices and Mean-Field Terms , 2014 .

[2]  Tamer Basar,et al.  Linear quadratic mean field Stackelberg differential games , 2018, Autom..

[3]  Peter J. Gawthrop,et al.  Optimal control of nonlinear systems: a predictive control approach , 2003, Autom..

[4]  Zhong-Ping Jiang,et al.  Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming , 2016, Autom..

[5]  Minyue Fu,et al.  Optimal Stabilization Control for Discrete-Time Mean-Field Stochastic Systems , 2019, IEEE Transactions on Automatic Control.

[6]  Peng Shi,et al.  Maximum principle for mean-field jump-diffusion stochastic delay differential equations and its application to finance , 2014, Autom..

[7]  Robert R. Bitmead,et al.  Stochastic output-feedback model predictive control , 2018, Autom..

[8]  Jun Fu,et al.  Robust Adaptive Dynamic Programming of Two-Player Zero-Sum Games for Continuous-Time Linear Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Bor-Sen Chen,et al.  Finite horizon mean-field stochastic H2/H∞ control for continuous-time systems with (x, v)-dependent noise , 2015, J. Frankl. Inst..

[10]  Yan Li,et al.  Linear-quadratic optimal control for unknown mean-field stochastic discrete-time system via adaptive dynamic programming approach , 2017, Neurocomputing.

[11]  Weihai Zhang,et al.  Infinite horizon linear quadratic Pareto game of the stochastic singular systems , 2018, J. Frankl. Inst..

[12]  Quanyan Zhu,et al.  Risk-Sensitive Mean-Field Games , 2012, IEEE Transactions on Automatic Control.

[13]  A. Bensoussan,et al.  Mean Field Games and Mean Field Type Control Theory , 2013 .

[14]  Minyue Fu,et al.  Mean field stochastic linear quadratic games for continuum-parameterized multi-agent systems , 2018, J. Frankl. Inst..

[15]  Jacob Engwerda,et al.  The regular convex cooperative linear quadratic control problem , 2008, Autom..

[16]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[17]  Weihai Zhang,et al.  Linear quadratic Pareto optimal control problem of stochastic singular systems , 2017, J. Frankl. Inst..

[18]  Qichao Zhang,et al.  Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs , 2017, Neurocomputing.

[19]  D. Kleinman,et al.  Optimal stationary control of linear systems with control-dependent noise , 1969 .

[20]  Weihai Zhang,et al.  Necessary and sufficient conditions for Pareto optimality of the stochastic systems in finite horizon , 2018, Autom..

[21]  Weihai Zhang,et al.  Pareto-based guaranteed cost control of the uncertain mean-field stochastic systems in infinite horizon , 2018, Autom..

[22]  P. Lions,et al.  Mean field games , 2007 .

[23]  Puduru Viswanadha Reddy,et al.  Pareto optimality in infinite horizon linear quadratic differential games , 2013, Autom..

[24]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[25]  Zhong-Ping Jiang,et al.  Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems , 2016, IEEE Transactions on Automatic Control.

[26]  Huaguang Zhang,et al.  Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm , 2018, Neurocomputing.

[27]  Puduru Viswanadha Reddy,et al.  Necessary and Sufficient Conditions for Pareto Optimality in Infinite Horizon Cooperative Differential Games , 2011, IEEE Transactions on Automatic Control.

[28]  Xun Li,et al.  Discrete-time mean-field Stochastic linear-quadratic optimal control problems, II: Infinite horizon case , 2015, Autom..

[29]  J. Yong,et al.  A Linear-Quadratic Optimal Control Problem for Mean-Field Stochastic Differential Equations in Infinite Horizon , 2012, 1208.5308.

[30]  Jacob Engwerda,et al.  LQ Dynamic Optimization and Differential Games , 2005 .

[31]  Jacob Engwerda,et al.  Necessary and Sufficient Conditions for Pareto Optimal Solutions of Cooperative Differential Games , 2010, SIAM J. Control. Optim..

[32]  Zhong-Ping Jiang,et al.  Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise , 2016, IEEE Transactions on Automatic Control.

[33]  Hiroaki Mukaidani,et al.  Pareto Optimal Strategy for Stochastic Weakly Coupled Large Scale Systems With State Dependent System Noise , 2009, IEEE Transactions on Automatic Control.

[34]  João Pedro Hespanha,et al.  Simultaneous nonlinear model predictive control and state estimation , 2017, Autom..

[35]  Xun Li,et al.  Indefinite Mean-Field Stochastic Linear-Quadratic Optimal Control: From Finite Horizon to Infinite Horizon , 2015, IEEE Transactions on Automatic Control.

[36]  Hiroaki Mukaidani,et al.  Stackelberg strategies for stochastic systems with multiple followers , 2015, Autom..

[37]  Yan Li,et al.  Stackelberg games for model-free continuous-time stochastic systems based on adaptive dynamic programming , 2019, Appl. Math. Comput..

[38]  Xiaohong Cui,et al.  Data-based approximate optimal control for nonzero-sum games of multi-player systems using adaptive dynamic programming , 2018, Neurocomputing.