Superhuman AI for heads-up no-limit poker: Libratus beats top professionals

Libratus versus humans Pitting artificial intelligence (AI) against top human players demonstrates just how far AI has come. Brown and Sandholm built a poker-playing AI called Libratus that decisively beat four leading human professionals in the two-player variant of poker called heads-up no-limit Texas hold'em (HUNL). Over nearly 3 weeks, Libratus played 120,000 hands of HUNL against the human professionals, using a three-pronged approach that included precomputing an overall strategy, adapting the strategy to actual gameplay, and learning from its opponent. Science, this issue p. 418 An artificial intelligence program called Libratus played 120,000 hands of a two-player variant of poker and beat four leading human professionals. No-limit Texas hold’em is the most popular form of poker. Despite artificial intelligence (AI) successes in perfect-information games, the private information and massive game tree have made no-limit poker difficult to tackle. We present Libratus, an AI that, in a 120,000-hand competition, defeated four top human specialist professionals in heads-up no-limit Texas hold’em, the leading benchmark and long-standing challenge problem in imperfect-information game solving. Our game-theoretic approach features application-independent techniques: an algorithm for computing a blueprint for the overall strategy, an algorithm that fleshes out the details of the strategy for subgames that are reached during play, and a self-improver algorithm that fixes potential weaknesses that opponents have identified in the blueprint strategy.

[1]  Michael Johanson,et al.  Measuring the Size of Large No-Limit Poker Games , 2013, ArXiv.

[2]  Kevin Waugh,et al.  DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.

[3]  Neil Burch,et al.  Heads-up limit hold’em poker is solved , 2015, Science.

[4]  Javier Peña,et al.  First-Order Algorithm with O(ln(1/e)) Convergence for e-Equilibrium in Two-Person Zero-Sum Games , 2008, AAAI.

[5]  Javier Peña,et al.  First-order algorithm with $${\mathcal{O}({\rm ln}(1{/}\epsilon))}$$ convergence for $${\epsilon}$$-equilibrium in two-person zero-sum games , 2012, Math. Program..

[6]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[7]  Javier Peña,et al.  Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..

[8]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[10]  Jonathan Schaeffer,et al.  One jump ahead - challenging human supremacy in checkers , 1997, J. Int. Comput. Games Assoc..

[11]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[12]  E. Damme,et al.  Non-Cooperative Games , 2000 .

[13]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14]  Lasse Becker-Czarnetzki Report on DeepStack Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker , 2019 .

[15]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Yurii Nesterov,et al.  Excessive Gap Technique in Nonsmooth Convex Minimization , 2005, SIAM J. Optim..

[17]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[18]  Toby Walsh,et al.  Symposium on Abstraction, Reformulation, and Approximation (SARA-2000) , 2001, AI Mag..

[19]  Richard G. Gibson Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents , 2014 .

[20]  Donald E. Knuth,et al.  An Analysis of Alpha-Beta Pruning , 1975, Artif. Intell..

[21]  K. Valentine,,et al.  Contributions to the theory of care , 1989 .

[22]  Michael L. Littman,et al.  Abstraction Methods for Game Theoretic Poker , 2000, Computers and Games.

[23]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[24]  L. S. Shapley,et al.  10. A SIMPLE THREE-PERSON POKER GAME , 1951 .

[25]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[26]  Tuomas Sandholm,et al.  Safe Opponent Exploitation , 2015, ACM Trans. Economics and Comput..

[27]  J. Ashby References and Notes , 1999 .

[28]  Donald A. Waterman,et al.  Generalization Learning Techniques for Automating the Learning of Heuristics , 1970, Artif. Intell..

[29]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.