DeepStack: Expert-level artificial intelligence in heads-up no-limit poker

Computer code based on continual problem re-solving beats human professional poker players at a two-player variant of poker. Artificial intelligence masters poker Computers can beat humans at games as complex as chess or go. In these and similar games, both players have access to the same information, as displayed on the board. Although computers have the ultimate poker face, it has been tricky to teach them to be good at poker, where players cannot see their opponents' cards. Moravčík et al. built a code dubbed DeepStack that managed to beat professional poker players at a two-player poker variant called heads-up no-limit Texas hold'em. Instead of devising its strategy beforehand, DeepStack recalculated it at each step, taking into account the current state of the game. The principles behind DeepStack may enable advances in solving real-world problems that involve information asymmetry. Science, this issue p. 508 Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker, the quintessential game of imperfect information, is a long-standing challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect-information settings. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition that is automatically learned from self-play using deep learning. In a study involving 44,000 hands of poker, DeepStack defeated, with statistical significance, professional poker players in heads-up no-limit Texas hold’em. The approach is theoretically sound and is shown to produce strategies that are more difficult to exploit than prior approaches.

[1]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[2]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[3]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[4]  R. J. Joenk,et al.  IBM journal of research and development: information for authors , 1978 .

[5]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[6]  L. V. Allis,et al.  Searching for solutions in games and artificial intelligence , 1994 .

[7]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[8]  Jonathan Schaeffer,et al.  CHINOOK: The World Man-Machine Checkers Champion , 1996, AI Mag..

[9]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[10]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[11]  Javier Peña,et al.  Gradient-Based Algorithms for Finding Nash Equilibria in Extensive Form Games , 2007, WINE.

[12]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[13]  Christos H. Papadimitriou,et al.  Proceedings of the 4th International Workshop on Internet and Network Economics , 2008 .

[14]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[15]  Tuomas Sandholm,et al.  The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[16]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[17]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[18]  Tara N. Sainath,et al.  The shared views of four research groups ) , 2012 .

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  David A. Ferrucci,et al.  Introduction to "This is Watson" , 2012, IBM J. Res. Dev..

[21]  Michael H. Bowling,et al.  Tractable Objectives for Robust Policy Optimization , 2012, NIPS.

[22]  Michael Johanson,et al.  Measuring the Size of Large No-Limit Poker Games , 2013, ArXiv.

[23]  Neil Burch,et al.  Heads-up limit hold’em poker is solved , 2015, Science.

[24]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[25]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[26]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[27]  Kevin Waugh,et al.  DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.