Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

Hindsight rationality is an approach to playing general-sum games that prescribes noregret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of deviations that respect the structure of extensive-form games. Integrating the idea of time selection into counterfactual regret minimization (CFR), we introduce the extensive-form regret minimization (EFR) algorithm that achieves hindsight rationality for any given set of behavioral deviations with computation that scales closely with the complexity of the set. We identify behavioral deviation subsets, the partial sequence deviation types, that subsume previously studied types and lead to efficient EFR instances in games with moderate lengths. In addition, we present a thorough empirical analysis of EFR instantiated with different deviation types in benchmark games, where we find that stronger types typically induce better performance.

[1]  Dustin R Morrill Using Regret Estimation to Solve Games Compactly , 2016 .

[2]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[3]  Sriram Srinivasan,et al.  OpenSpiel: A Framework for Reinforcement Learning in Games , 2019, ArXiv.

[4]  H. W. Kuhn,et al.  11. Extensive Games and the Problem of Information , 1953 .

[5]  Adam Lerer,et al.  DREAM: Deep Regret minimization with Advantage baselines and Model-free learning , 2020, ArXiv.

[6]  Michael H. Bowling,et al.  Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization , 2012, AAMAS.

[7]  Michael Bowling,et al.  Alternative Function Approximation Parameterizations for Solving Games: An Analysis of f-Regression Counterfactual Regret Minimization , 2020, AAMAS.

[8]  Duane Szafron,et al.  Generalized Sampling and Variance in Counterfactual Regret Minimization , 2012, AAAI.

[9]  Michael H. Bowling,et al.  Monte carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games , 2013 .

[10]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[11]  Ian A. Kash,et al.  Combining No-regret and Q-learning , 2019, AAMAS.

[12]  Martin Schmid,et al.  Low-Variance and Zero-Variance Baselines for Extensive-Form Games , 2019, ICML.

[13]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[14]  Oskari Tammelin,et al.  Solving Large Imperfect Information Games Using CFR+ , 2014, ArXiv.

[15]  Kevin Waugh,et al.  Solving Games with Functional Regret Estimation , 2014, AAAI Workshop: Computer Poker and Imperfect Information.

[16]  Miroslav Dudík,et al.  A Sampling-Based Approach to Computing Equilibria in Succinct Extensive-Form Games , 2009, UAI.

[17]  Duane Szafron,et al.  Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions , 2012, NIPS.

[18]  Tuomas Sandholm,et al.  Faster Game Solving via Predictive Blackwell Approachability: Connecting Regret Matching and Mirror Descent , 2020, AAAI.

[19]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[20]  Karthik Sridharan,et al.  Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[21]  S. Ross GOOFSPIEL -- THE GAME OF PURE STRATEGY , 1971 .

[22]  Haipeng Luo,et al.  Fast Convergence of Regularized Learning in Games , 2015, NIPS.

[23]  Ruitong Huang,et al.  Optimistic and Adaptive Lagrangian Hedging , 2021, ArXiv.

[24]  H. Francis Song,et al.  Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[25]  No-Regret Algorithms for Structured Prediction Problems , 2005 .

[26]  Amy Greenwald,et al.  A General Class of No-Regret Learning Algorithms and Game-Theoretic Equilibria , 2003, COLT.

[27]  Richard G. Gibson Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents , 2014 .

[28]  Michael H. Bowling,et al.  Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines , 2018, AAAI.

[29]  Tuomas Sandholm,et al.  Deep Counterfactual Regret Minimization , 2018, ICML.

[30]  Ryan D'Orazio Regret Minimization with Function Approximation in Extensive-Form Games , 2020 .

[31]  F. Forges,et al.  Computionally Efficient Coordination in Games Trees , 2002 .

[32]  Michael Bowling,et al.  Hindsight and Sequential Rationality of Correlated Play , 2021, AAAI.

[33]  Nicola Gatti,et al.  Simple Uncoupled No-regret Learning Dynamics for Extensive-form Correlated Equilibrium , 2020, J. ACM.

[34]  Neil Burch,et al.  Time and Space: Why Imperfect Information Games are Hard , 2018 .

[35]  Kevin Waugh,et al.  A Unified View of Large-Scale Zero-Sum Equilibrium Computation , 2014, AAAI Workshop: Computer Poker and Imperfect Information.

[36]  Kevin Waugh,et al.  Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[37]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[38]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[39]  Tuomas Sandholm,et al.  Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks , 2019, NeurIPS.

[40]  Zheng Li,et al.  Bounds for Regret-Matching Algorithms , 2006, AI&M.

[41]  Michael H. Bowling,et al.  Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.

[42]  Tuomas Sandholm,et al.  Stable-Predictive Optimistic Counterfactual Regret Minimization , 2019, ICML.

[43]  Bernhard von Stengel,et al.  Extensive-Form Correlated Equilibrium: Definition and Computational Complexity , 2008, Math. Oper. Res..