All Simulations Are Not Equal: Simulation Reweighing for Imperfect Information Games

Imperfect information games are challenging benchmarks for artificial intelligent systems. To reason and plan under uncertainty is a key towards general AI. Traditionally, large amounts of simulations are used in imperfect information games, and they sometimes perform sub-optimally due to large state and action spaces. In this work, we propose a simulation reweighing mechanism using neural networks. It performs backwards verification to public previous actions and assign proper belief weights to the simulations from the information set of the current observation, using an incomplete state solver network (ISSN). We use simulation reweighing in the playing phase of the game contract bridge, and show that it outperforms previous state-of-the-art Monte Carlo simulation based methods, and achieves better play per decision.

[1]  Peter I. Cowling,et al.  Determinization and information set Monte Carlo Tree Search for the card game Dou Di Zhu , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[2]  Yuandong Tian,et al.  Latent forward model for Real-time Strategy game planning with incomplete information , 2018 .

[3]  Peter I. Cowling,et al.  Monte Carlo search applied to card selection in Magic: The Gathering , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[4]  J. Rosenthal,et al.  General state space Markov chains and MCMC algorithms , 2004, math/0404033.

[5]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[6]  Matthew L. Ginsberg,et al.  GIB: Steps Toward an Expert-Level Bridge-Playing Program , 1999, IJCAI.

[7]  Tuomas Sandholm,et al.  The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[8]  Lisheng Wu,et al.  Learning to Communicate Implicitly by Actions , 2018, AAAI.

[9]  Tao Qin,et al.  Competitive Bridge Bidding with Deep Neural Networks , 2019, AAMAS.

[10]  Daniel Whitehouse,et al.  Monte Carlo Tree Search for games with hidden information and uncertainty , 2014 .

[11]  Nicolas Usunier,et al.  Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger , 2018, NeurIPS.

[12]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[13]  Noam Brown,et al.  Superhuman AI for multiplayer poker , 2019, Science.

[14]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[15]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[16]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[17]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[18]  Miroslav Dudík,et al.  A Sampling-Based Approach to Computing Equilibria in Succinct Extensive-Form Games , 2009, UAI.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[21]  Hsuan-Tien Lin,et al.  Automatic Bridge Bidding Using Deep Reinforcement Learning , 2016, IEEE Transactions on Games.

[22]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[23]  W. Gilks,et al.  Adaptive Rejection Sampling for Gibbs Sampling , 1992 .

[24]  H. Francis Song,et al.  Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning , 2018, ICML.