Combining Tree Search and Action Prediction for State-of-the-Art Performance in DouDiZhu

AlphaZero has achieved superhuman performance on various perfect-information games, such as chess, shogi and Go. However, directly applying AlphaZero to imperfect-information games (IIG) is infeasible, due to the fact that traditional MCTS methods cannot handle missing information of other players. Meanwhile, there have been several extensions of MCTS for IIGs, by implicitly or explicitly sampling a state of other players. But, due to the inability to handle private and public information well, the performance of these methods is not satisfactory. In this paper, we extend AlphaZero to multiplayer IIGs by developing a new MCTS method, Action-Prediction MCTS (AP-MCTS). In contrast to traditional MCTS extensions for IIGs, AP-MCTS first builds the search tree based on public information, adopts the policy-value network to generalize between hidden states, and finally predicts other players’ actions directly. This design bypasses the inefficiency of sampling and the difficulty of predicting the state of other players. We conduct extensive experiments on the popular 3-player poker game DouDiZhu to evaluate the performance of AP-MCTS combined with the framework AlphaZero. When playing against experienced human players, AP-MCTS achieved a 65.65% winning rate, which is almost twice the human’s winning rate. When comparing with stateof-the-art DouDiZhu AIs, the Elo rating of APMCTS is 50 to 200 higher than them. The ablation study shows that accurate action prediction is the key to AP-MCTS winning.

[1]  Hao Chen,et al.  DeltaDou: Expert-level Doudizhu AI through Self-play , 2019, IJCAI.

[2]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[3]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[4]  Peter I. Cowling,et al.  Determinization and information set Monte Carlo Tree Search for the card game Dou Di Zhu , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[5]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[6]  L. Christophorou Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.

[7]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[8]  P. Cowling,et al.  Determinization in Monte-Carlo Tree Search for the card game , 2011 .

[9]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[10]  Rémi Coulom,et al.  Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength , 2008, Computers and Games.

[11]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[12]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[13]  Kevin Waugh,et al.  Accelerating Best Response Calculation in Large Extensive Games , 2011, IJCAI.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[16]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[17]  Graham Kendall,et al.  Editorial: IEEE Transactions on Computational Intelligence and AI in Games , 2015, IEEE Trans. Comput. Intell. AI Games.

[18]  Aviad Rubinstein,et al.  Inapproximability of Nash Equilibrium , 2014, STOC.

[19]  Cewu Lu,et al.  Combinational Q-Learning for Dou Di Zhu , 2019, AAAI 2019.

[20]  Peter I. Cowling,et al.  Information Set Monte Carlo Tree Search , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[21]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.