Common Information Belief based Dynamic Programs for Stochastic Zero-sum Games with Competing Teams

Decentralized team problems where players have asymmetric information about the state of the underlying stochastic system have been actively studied, but games between such teams are less understood. We consider a general model of zerosum stochastic games between two competing teams. This model subsumes many previously considered team and zero-sum game models. For this general model, we provide bounds on the upper (min-max) and lower (max-min) values of the game. Furthermore, if the upper and lower values of the game are identical (i.e., if the game has a value), our bounds coincide with the value of the game. Our bounds are obtained using two dynamic programs based on a sufficient statistic known as the common information belief (CIB). We also identify certain information structures in which only the minimizing team controls the evolution of the CIB. In these cases, we show that one of our CIB based dynamic programs can be used to find the min-max strategy (in addition to the min-max value). We propose an approximate dynamic programming approach for computing the values (and the strategy when applicable) and illustrate our results with the help of an example.

[1]  Tamer Basar,et al.  Common Information Based Markov Perfect Equilibria for Stochastic Games With Asymmetric Information: Finite Games , 2014, IEEE Transactions on Automatic Control.

[2]  Yi Ouyang,et al.  Dynamic Games With Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition , 2015, IEEE Transactions on Automatic Control.

[3]  Xiaoxi Li,et al.  Recursive games: uniform value, Tauberian theorem and the Mertens conjecture “$$Maxmin=\lim v_n=\lim v_{\uplambda }$$Maxmin=limvn=limvλ” , 2015, Int. J. Game Theory.

[4]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[5]  Ashutosh Nayyar,et al.  Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach , 2012, IEEE Transactions on Automatic Control.

[6]  Ashutosh Nayyar,et al.  Dynamic Games Among Teams with Delayed Intra-Team Information Sharing , 2021, Dynamic Games and Applications.

[7]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[8]  David A. Castañón,et al.  Decomposition techniques for Markov zero-sum games with nested information , 2013, 52nd IEEE Conference on Decision and Control.

[9]  Achilleas Anastasopoulos,et al.  A systematic process for evaluating structured perfect Bayesian equilibria in dynamic games with asymmetric information , 2015, 2016 American Control Conference (ACC).

[10]  H. Francis Song,et al.  Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[11]  Miquel Oliu-Barton,et al.  Existence of the uniform value in zero-sum repeated games with a more informed controller , 2014 .

[12]  Ashutosh Nayyar,et al.  Information structures and values in zero-sum stochastic games , 2017, 2017 American Control Conference (ACC).

[13]  Yi Ouyang,et al.  A Sufficient Information Approach to Decentralized Decision Making , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[14]  P. Malliavin Infinite dimensional analysis , 1993 .

[15]  Branislav Bosanský,et al.  Heuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games , 2017, AAAI.

[16]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[17]  Olivier Buffet,et al.  Optimally Solving Two-Agent Decentralized POMDPs Under One-Sided Information Sharing , 2020, ICML.

[18]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[19]  Artificial Intelligence and Conservation , 2019 .

[20]  Jérôme Renault,et al.  The Value of Repeated Games with an Informed Controller , 2008, Math. Oper. Res..

[21]  Ashutosh Nayyar,et al.  Optimal Control Strategies in Delayed Sharing Information Structures , 2010, IEEE Transactions on Automatic Control.

[22]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[23]  Jeff S. Shamma,et al.  LP formulation of asymmetric zero-sum stochastic games , 2014, 53rd IEEE Conference on Decision and Control.

[24]  Vivek S. Borkar,et al.  Common randomness and distributed control: A counterexample , 2007, Systems & control letters (Print).

[25]  Ashutosh Nayyar,et al.  Upper and Lower Values in Zero-Sum Stochastic Games with Asymmetric Information , 2020, Dyn. Games Appl..

[26]  Branislav Bosanský,et al.  Solving Partially Observable Stochastic Games with Public Observations , 2019, AAAI.

[27]  Olivier Buffet,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Optimally Solving Dec-POMDPs as Continuous-State MDPs , 2022 .

[28]  Bo An,et al.  Computing Team-Maxmin Equilibria in Zero-Sum Multiplayer Extensive-Form Games , 2020, AAAI.

[29]  Nicolas Vieille,et al.  Stochastic Games with a Single Controller and Incomplete Information , 2002, SIAM J. Control. Optim..

[30]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[31]  Tuomas Sandholm,et al.  Ex ante coordination and collusion in zero-sum multi-player extensive-form games , 2018, NeurIPS.

[32]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.