论文信息 - Game theory-based opponent modeling in large imperfect-information games

Game theory-based opponent modeling in large imperfect-information games

We develop an algorithm for opponent modeling in large extensive-form games of imperfect information. It works by observing the opponent's action frequencies and building an opponent model by combining information from a precomputed equilibrium strategy with the observations. It then computes and plays a best response to this opponent model; the opponent model and best response are both updated continually in real time. The approach combines game-theoretic reasoning and pure opponent modeling, yielding a hybrid that can effectively exploit opponents after only a small number of interactions. Unlike prior opponent modeling approaches, ours is fundamentally game theoretic and takes advantage of recent algorithms for automated abstraction and equilibrium computation rather than relying on domain-specific prior distributions, historical data, or a handcrafted set of features. Experiments show that our algorithm leads to significantly higher win rates (than an approximate-equilibrium strategy) against several opponents in limit Texas Hold'em --- the most studied imperfect-information game in computer science --- including competitors from recent AAAI computer poker competitions.

Tuomas Sandholm | Sam Ganzfried | T. Sandholm | Sam Ganzfried

[1] Kurt Driessens,et al. Bayes-Relational Learning of Opponent Models from Incomplete Information in No-Limit Poker , 2008, AAAI.

[2] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[3] Jonathan Schaeffer,et al. Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.

[4] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[5] Jonathan Schaeffer,et al. Improved Opponent Modeling in Poker , 2000 .

[6] Bret Hoehn,et al. Effective short-term opponent exploitation in simplified poker , 2005, Machine Learning.

[7] Peter Stone,et al. Convergence, Targeted Optimality, and Safety in Multiagent Learning , 2010, ICML.

[8] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[9] Javier Peña,et al. Gradient-Based Algorithms for Finding Nash Equilibria in Extensive Form Games , 2007, WINE.

[10] Tuomas Sandholm,et al. Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[11] Marc Lanctot,et al. MCRNR: Fast Computing of Restricted Nash Responses by Means of Sampling , 2010, Interactive Decision Theory and Game Theory.

[12] Michael H. Bowling,et al. Data Biased Robust Counter Strategies , 2009, AISTATS.

[13] D. Koller,et al. Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[14] Michael H. Bowling,et al. Computing Robust Counter-Strategies , 2007, NIPS.

[15] Troels Bjerre Lund,et al. Potential-Aware Automated Abstraction of Sequential Games, and Holistic Equilibrium Analysis of Texas Hold'em Poker , 2007, AAAI.

[16] Peter McCracken,et al. Safe Strategies for Agent Modelling in Games , 2004, AAAI Technical Report.

[17] Tuomas Sandholm,et al. A Competitive Texas Hold'em Poker Player via Automated Abstraction and Real-Time Equilibrium Computation , 2006, AAAI.