Opponent modeling and exploitation in poker using evolved recurrent neural networks

As a classic example of imperfect information games, Heads-Up No-limit Texas Holdem (HUNL) has been studied extensively in recent years. While state-of-the-art approaches based on Nash equilibrium have been successful, they lack the ability to model and exploit opponents effectively. This paper presents an evolutionary approach to discover opponent models based on recurrent neural networks (LSTM) and Pattern Recognition Trees. Experimental results showed that poker agents built in this method can adapt to opponents they have never seen in training and exploit weak strategies far more effectively than Slumbot 2017, one of the cutting-edge Nash-equilibrium-based poker agents. In addition, agents evolved through playing against relatively weak rule-based opponents tied statistically with Slumbot in heads-up matches. Thus, the proposed approach is a promising new direction for building high-performance adaptive agents in HUNL and other imperfect information games.

[1]  Michael H. Bowling,et al.  Data Biased Robust Counter Strategies , 2009, AISTATS.

[2]  Jonathan Schaeffer,et al.  Opponent Modeling in Poker , 1998, AAAI/IAAI.

[3]  Tuomas Sandholm,et al.  Solving two-person zero-sum repeated games of incomplete information , 2008, AAMAS.

[4]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[5]  Tuomas Sandholm,et al.  Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold'em Agent , 2015, AAAI Workshop: Computer Poker and Imperfect Information.

[6]  Tuomas Sandholm,et al.  Game theory-based opponent modeling in large imperfect-information games , 2011, AAMAS.

[7]  Tuomas Sandholm,et al.  Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[8]  Kevin B. Korb,et al.  Bayesian Poker , 1999, UAI.

[9]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[10]  Neil Burch,et al.  Heads-up limit hold’em poker is solved , 2015, Science.

[11]  Kurt Driessens,et al.  Bayes-Relational Learning of Opponent Models from Incomplete Information in No-Limit Poker , 2008, AAAI.

[12]  Tuomas Sandholm,et al.  Safe opponent exploitation , 2012, EC '12.

[13]  Jonathan Schaeffer,et al.  Improved Opponent Modeling in Poker , 2000 .

[14]  Michael H. Bowling,et al.  Computing Robust Counter-Strategies , 2007, NIPS.

[15]  Risto Miikkulainen,et al.  Evolving Adaptive Poker Players for Effective Opponent Exploitation , 2017, AAAI Workshops.

[16]  Omer Ekmekci,et al.  Learning Strategies for Opponent Modeling in Poker , 2013, AAAI 2013.

[17]  Sam Ganzfried,et al.  Bayesian Opponent Exploitation in Imperfect-Information Games , 2016, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[18]  Luís Paulo Reis,et al.  Identifying Players Strategies in No Limit Texas Holdém Poker through the Analysis of Individual Moves , 2013, ArXiv.

[19]  Michael H. Bowling,et al.  Online implicit agent modelling , 2013, AAMAS.

[20]  Risto Miikkulainen,et al.  Evolving opponent models for Texas Hold 'Em , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Michael H. Bowling,et al.  Eqilibrium Approximation Quality of Current No-Limit Poker Bots , 2016, AAAI Workshops.

[23]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[24]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Darse Billings Algorithms and assessment in computer poker , 2006 .

[26]  Marc Lanctot,et al.  Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling , 2011, J. Artif. Intell. Res..

[27]  Ian D. Watson,et al.  A statistical exploitation module for Texas Hold'em: And it's benefits when used with an approximate nash equilibrium strategy , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[28]  Kevin Waugh,et al.  DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.