论文信息 - DecisionHoldem: Safe Depth-Limited Solving With Diverse Opponents for Imperfect-Information Games

DecisionHoldem: Safe Depth-Limited Solving With Diverse Opponents for Imperfect-Information Games

An imperfect-information game is a type of game with asymmetric information. It is more common in life than perfect-information game. Artificial intelligence (AI) in imperfect-information games, such like poker, has made considerable progress and success in recent years. The great success of superhuman poker AI, such as Libratus and Deepstack, attracts researchers to pay attention to poker research. However, the lack of open-source code limits the development of Texas hold’em AI to some extent. This article introduces DecisionHoldem, a high-level AI for heads-up nolimit Texas hold’em with safe depth-limited subgame solving by considering possible ranges of opponent’s private hands to reduce the exploitability of the strategy. Experimental results show that DecisionHoldem defeats the strongest openly available agent in heads-up no-limit Texas hold’em poker, namely Slumbot, and a high-level reproduction of Deepstack, viz, Openstack, by more than 730 mbb/h (one-thousandth big blind per round) and 700 mbb/h. Moreover, we release the source codes and tools of DecisionHoldem to promote AI development in imperfect-information games.

[1] Thomas G. Dietterich,et al. In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.

[2] Tuomas Sandholm,et al. Depth-Limited Solving for Imperfect-Information Games , 2018, NeurIPS.

[3] Michael Johanson,et al. Measuring the Size of Large No-Limit Poker Games , 2013, ArXiv.

[4] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[5] L. Christophorou. Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.

[6] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[7] Tuomas Sandholm,et al. Regret Circuits: Composability of Regret Minimizers , 2018, ICML.

[8] Tuomas Sandholm,et al. Dynamic Thresholding and Pruning for Regret Minimization , 2017, AAAI.

[9] Kaiqi Huang,et al. OpenHoldem: An Open Toolkit for Large-Scale Imperfect-Information Game Research , 2020, ArXiv.

[10] M. Hartley. Multi-Agent Counterfactual Regret Minimization for Partial-Information Collaborative Games , 2017 .

[11] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[12] Lucy Rosenbloom. arXiv , 2019, The Charleston Advisor.

[13] Tuomas Sandholm,et al. Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions , 2019, NeurIPS.

[14] Tuomas Sandholm,et al. Solving Imperfect-Information Games via Discounted Regret Minimization , 2018, AAAI.

[15] Michael H. Bowling,et al. Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines , 2018, AAAI.