Public Information Representation for Adversarial Team Games

The study of sequential games in which a team plays against an adversary is receiving an increasing attention in the scientific literature. Their peculiarity resides in the asymmetric information available to the team members during the play which makes the equilibrium computation problem hard even with zero-sum payoffs. The algorithms available in the literature work with implicit representations of the strategy space and mainly resort to Linear Programming and column generation techniques. Such representations prevent from the adoption of standard tools for the generation of abstractions that previously demonstrated to be crucial when solving huge two-player zero-sum games. Differently from those works, we investigate the problem of designing a suitable game representation over which abstraction algorithms can work. In particular, our algorithms convert a sequential teamgame with adversaries to a classical two-player zero-sum game. In this converted game, the team is transformed into a single coordinator player which only knows information common to the whole team and prescribes to the players an action for any possible private state. Our conversion enables the adoption of highly scalable techniques already available for two-player zero-sum games, including techniques for generating automated abstractions. Because of the NP-hard nature of the problem, the resulting Public Team game may be exponentially larger than the original one. To limit this explosion, we design three pruning techniques that dramatically reduce the size of the tree. Finally, we show the effectiveness of the proposed approach by presenting experimental results on Kuhn and Leduc Poker games, obtained by applying state-of-art algorithms for two players zero-sum games on the converted games.

[1]  Tuomas Sandholm,et al.  Connecting Optimal Ex-Ante Collusion in Teams to Extensive-Form Correlation: Faster Algorithms and Positive Complexity Results , 2021, ICML.

[2]  H. Francis Song,et al.  Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[3]  Adam Lerer,et al.  Combining Deep Reinforcement Learning and Search for Imperfect-Information Games , 2020, NeurIPS.

[4]  Tuomas Sandholm,et al.  Ex ante coordination and collusion in zero-sum multi-player extensive-form games , 2018, NeurIPS.

[5]  B. Stengel,et al.  Team-Maxmin Equilibria☆ , 1997 .

[6]  M. Kaneko,et al.  Behavior strategies, mixed strategies and perfect recall , 1995 .

[7]  Oskari Tammelin,et al.  Solving Large Imperfect Information Games Using CFR+ , 2014, ArXiv.

[8]  Tuomas Sandholm,et al.  Libratus: The Superhuman AI for No-Limit Poker , 2017, IJCAI.

[9]  Tuomas Sandholm,et al.  Solving Imperfect-Information Games via Discounted Regret Minimization , 2018, AAAI.

[10]  Tuomas Sandholm,et al.  Deep Counterfactual Regret Minimization , 2018, ICML.

[11]  Kevin Waugh,et al.  Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[12]  Nicola Basilico,et al.  Computing the Team-maxmin Equilibrium in Single-Team Single-Adversary Team Games , 2017, Intelligenza Artificiale.

[13]  Noam Brown,et al.  Superhuman AI for multiplayer poker , 2019, Science.

[14]  Nicola Gatti,et al.  Multi-Agent Coordination in Adversarial Environments through Signal Mediated Strategies , 2021, AAMAS.

[15]  Nicola Gatti,et al.  Coordination in Adversarial Sequential Team Games via Multi-Agent Deep Reinforcement Learning , 2019, ArXiv.

[16]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[17]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[18]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[19]  Nicola Gatti,et al.  Computational Results for Extensive-Form Adversarial Team Games , 2017, AAAI.

[20]  Nicola Basilico,et al.  Team-Maxmin Equilibrium: Efficiency Bounds and Algorithms , 2016, AAAI.

[21]  Ashutosh Nayyar,et al.  Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach , 2012, IEEE Transactions on Automatic Control.

[22]  Michael H. Bowling,et al.  Solving Common-Payoff Games with Approximate Policy Iteration , 2021, AAAI.