Stackelberg games for model-free continuous-time stochastic systems based on adaptive dynamic programming

Abstract Solving the Stackelberg game problem generally needs full data of the system. In this paper, two online adaptive dynamic programming algorithms are proposed to solve the Stackelberg game problem for model-free linear continuous-time systems subject to multiplicative noise. Stackelberg games are based on two different strategies: Nash-based Stackelberg strategy and Pareto-based Stackelberg strategy. We apply directly the state and input information to iteratively update Stackelberg games online. The effectiveness of the algorithms is verified by two simulation examples.

[1]  Zhong-Ping Jiang,et al.  Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise , 2016, IEEE Transactions on Automatic Control.

[2]  Ju H. Park,et al.  Reliable mixed H∞/passive control for T-S fuzzy delayed systems based on a semi-Markov jump model approach , 2017, Fuzzy Sets Syst..

[3]  Huaguang Zhang,et al.  Neural-Network-Based Constrained Optimal Control Scheme for Discrete-Time Switched Nonlinear System Using Dual Heuristic Programming , 2014, IEEE Transactions on Automation Science and Engineering.

[4]  Tianyou Chai,et al.  Necessary and Sufficient Condition for Two-Player Stackelberg Strategy , 2015, IEEE Transactions on Automatic Control.

[5]  Bor-Sen Chen,et al.  Stochastic H2/H∞ control with state-dependent noise , 2004, IEEE Trans. Autom. Control..

[6]  Dongbin Zhao,et al.  Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Hamid Reza Karimi,et al.  Finite-Time Event-Triggered $\mathcal{H}_{\infty }$ Control for T–S Fuzzy Markov Jump Systems , 2018, IEEE Transactions on Fuzzy Systems.

[8]  Jinde Cao,et al.  Generalized State Estimation for Markovian Coupled Networks Under Round-Robin Protocol and Redundant Channels , 2019, IEEE Transactions on Cybernetics.

[9]  Hiroaki Mukaidani,et al.  Static Output-Feedback Incentive Stackelberg Game for Discrete-Time Markov Jump Linear Stochastic Systems With External Disturbance , 2018, IEEE Control Systems Letters.

[10]  Shengyuan Xu,et al.  Slow State Variables Feedback Stabilization for Semi-Markov Jump Systems With Singular Perturbations , 2018, IEEE Transactions on Automatic Control.

[11]  Frank L. Lewis,et al.  Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[12]  Weihai Zhang,et al.  Stochastic linear quadratic optimal control with constraint for discrete-time systems , 2014, Appl. Math. Comput..

[13]  Puduru Viswanadha Reddy,et al.  Pareto optimality in infinite horizon linear quadratic differential games , 2013, Autom..

[14]  Zhong-Ping Jiang,et al.  Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems , 2013, IEEE Transactions on Automatic Control.

[15]  Heinrich von Stackelberg,et al.  Stackelberg (Heinrich von) - The Theory of the Market Economy, translated from the German and with an introduction by Alan T. PEACOCK. , 1953 .

[16]  Gerhard Freiling,et al.  Existence and Uniqueness of Open-Loop Stackelberg Equilibria in Linear-Quadratic Differential Games , 2001 .

[17]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[18]  J. Cruz,et al.  On the Stackelberg strategy in nonzero-sum games , 1973 .

[19]  Hiroaki Mukaidani,et al.  Stackelberg strategies for stochastic systems with multiple followers , 2015, Autom..

[20]  Zhong-Ping Jiang,et al.  Adaptive dynamic programming and optimal control of nonlinear nonaffine systems , 2014, Autom..

[21]  D. Kleinman,et al.  Optimal stationary control of linear systems with control-dependent noise , 1969 .

[22]  Zhong-Ping Jiang,et al.  Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design , 2016, Autom..

[23]  Kun Zhang,et al.  Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systems , 2018, Neurocomputing.

[24]  E. Trélat,et al.  Min-max and min-min stackelberg strategies with closed-loop information structure , 2011 .

[25]  Weihai Zhang,et al.  ℋ- index for continuous-time stochastic systems with Markov jump and multiplicative noise , 2019, Autom..

[26]  Yan Li,et al.  $${\mathscr{H}}_-$$H- Index for Nonlinear Stochastic Systems with State- and Input-Dependent Noises , 2018, Int. J. Fuzzy Syst..

[27]  Yan Li,et al.  Linear-quadratic optimal control for unknown mean-field stochastic discrete-time system via adaptive dynamic programming approach , 2017, Neurocomputing.

[28]  Huaguang Zhang,et al.  Infinite-time stochastic linear quadratic optimal control for unknown discrete-time systems using adaptive dynamic programming approach , 2016, Neurocomputing.

[29]  Hiroaki Mukaidani,et al.  Incentive Stackelberg Games for Stochastic Linear Systems With $H_\infty$ Constraint , 2019, IEEE Transactions on Cybernetics.

[30]  Weihai Zhang,et al.  An Open-Loop Stackelberg Strategy for the Linear Quadratic Mean-Field Stochastic Differential Game , 2019, IEEE Transactions on Automatic Control.

[31]  Marc Jungers,et al.  On Linear-Quadratic Stackelberg Games With Time Preference Rates , 2008, IEEE Transactions on Automatic Control.

[32]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[33]  Huaguang Zhang,et al.  Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm , 2018, Neurocomputing.

[34]  Jing Wang,et al.  Mixed H∞ /passive sampled-data synchronization control of complex dynamical networks with distributed coupling delay , 2017, J. Frankl. Inst..

[35]  Hiroaki Mukaidani,et al.  Infinite horizon linear-quadratic Stackelberg games for discrete-time stochastic systems , 2017, Autom..

[36]  Tao Feng,et al.  Distributed Optimal Consensus Control for Nonlinear Multiagent System With Unknown Dynamic , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Weihai Zhang,et al.  ℋ- index for discrete-time stochastic systems with Markovian jump and multiplicative noise , 2018, Autom..

[38]  Jose B. Cruz,et al.  An approach to discrete-time incentive feedback Stackelberg games , 2002, IEEE Trans. Syst. Man Cybern. Part A.

[39]  Dong Yue,et al.  Output feedback tracking control of a class of continuous nonlinear systems via adaptive dynamic programming approach , 2016, 2016 12th World Congress on Intelligent Control and Automation (WCICA).

[40]  Huaguang Zhang,et al.  Data-driven optimal tracking control for discrete-time systems with delays using adaptive dynamic programming , 2018, J. Frankl. Inst..

[41]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[42]  Zhong-Ping Jiang,et al.  Stochastic and adaptive optimal control of uncertain interconnected systems: A data-driven approach , 2018, Syst. Control. Lett..

[43]  Jinde Cao,et al.  Network-Based Quantized Control for Fuzzy Singularly Perturbed Semi-Markov Jump Systems and its Application , 2019, IEEE Transactions on Circuits and Systems I: Regular Papers.

[44]  Suresh P. Sethi,et al.  Differential Games with Mixed Leadership: The Open-Loop Solution , 2009, Appl. Math. Comput..

[45]  Hiroaki Mukaidani,et al.  Pareto Optimal Strategy for Stochastic Weakly Coupled Large Scale Systems With State Dependent System Noise , 2009, IEEE Transactions on Automatic Control.

[46]  Hao Shen,et al.  A Markov jump model approach to reliable event-triggered retarded dynamic output feedback H∞ control for networked systems , 2017 .

[47]  C. Chen,et al.  Stackelburg solution for two-person games with biased information patterns , 1972 .

[48]  Eugênio B. Castelan,et al.  Bounded Nash type controls for uncertain linear systems , 2008, Autom..

[49]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.