Combining Prediction of Human Decisions with ISMCTS in Imperfect Information Games

Monte Carlo Tree Search (MCTS) has been extended to many imperfect information games. However, due to the added complexity that uncertainty introduces, these adaptations have not reached the same level of practical success as their perfect information counterparts. In this paper we consider the development of agents that perform well against humans in imperfect information games with partially observable actions. We introduce the Semi-Determinized-MCTS (SDMCTS), a variant of the Information Set MCTS algorithm (ISMCTS). More specifically, SDMCTS generates a predictive model of the unobservable portion of the opponent's actions from historical behavioral data. Next, SDMCTS performs simulations on an instance of the game where the unobservable portion of the opponent's actions are determined. Thereby, it facilitates the use of the predictive model in order to decrease uncertainty. We present an implementation of the SDMCTS applied to the Cheat Game, a well-known card game, with partially observable (and often deceptive) actions. Results from experiments with 120 subjects playing a head-to-head Cheat Game against our SDMCTS agents suggest that SDMCTS performs well against humans, and its performance improves as the predictive model's accuracy increases.

[1]  Ian Frank,et al.  Finding Optimal Strategies for Imperfect Information Games , 1998, AAAI/IAAI.

[2]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[3]  Michael Bradley Johanson Robust Strategies and Counter-Strategies: From Superhuman to Optimal Play , 2016 .

[4]  Brian Sheppard,et al.  World-championship-caliber Scrabble , 2002, Artif. Intell..

[5]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[6]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[7]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[8]  Yiannis Demiris,et al.  Prediction of intent in robotics and multi-agent systems , 2007, Cognitive Processing.

[9]  Matthew L. Ginsberg,et al.  GIB: Imperfect Information in a Computationally Challenging Game , 2011, J. Artif. Intell. Res..

[10]  Jonathan Schaeffer,et al.  Improved Opponent Modeling in Poker , 2000 .

[11]  Ryan B. Hayward,et al.  Monte Carlo Tree Search in Hex , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[12]  Nathan R. Sturtevant,et al.  An Analysis of UCT in Multi-Player Games , 2008, J. Int. Comput. Games Assoc..

[13]  Sarit Kraus,et al.  Providing Arguments in Discussions on the Basis of the Prediction of Human Argumentative Behavior , 2016, ACM Trans. Interact. Intell. Syst..

[14]  Alvin E. Roth,et al.  Modelling Predicting How People Play Games: Reinforcement learning in experimental games with unique , 1998 .

[15]  Sarit Kraus,et al.  Predicting Human Strategic Decisions Using Facial Expressions , 2013, IJCAI.

[16]  Angelo Cangelosi,et al.  ACM Transactions on Interactive Intelligent Systems (TiiS) Special Issue on Trust and Influence in Intelligent Human-Machine Interaction , 2018, ACM Trans. Interact. Intell. Syst..

[17]  Shaul Markovitch,et al.  Learning and Exploiting Relative Weaknesses of Opponent Agents , 2005, Autonomous Agents and Multi-Agent Systems.

[18]  Graham Kendall,et al.  Editorial: IEEE Transactions on Computational Intelligence and AI in Games , 2015, IEEE Trans. Comput. Intell. AI Games.

[19]  Jordan L. Boyd-Graber,et al.  Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[20]  Bernhard von Stengel,et al.  Algorithms for abstracting and solving imperfect information games , 2007 .

[21]  William Stafford Noble,et al.  Support vector machine , 2013 .

[22]  Amos Azaria,et al.  Combining psychological models with machine learning to better predict people’s decisions , 2012, Synthese.

[23]  Tuomas Sandholm,et al.  Algorithms for abstracting and solving imperfect information games , 2009 .

[24]  Peter I. Cowling,et al.  Information Set Monte Carlo Tree Search , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[25]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[26]  Marc Lanctot,et al.  Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling , 2011, J. Artif. Intell. Res..

[27]  David Silver,et al.  Smooth UCT Search in Computer Poker , 2015, IJCAI.

[28]  Yngvi Björnsson,et al.  CadiaPlayer: A Simulation-Based General Game Player , 2009, IEEE Transactions on Computational Intelligence and AI in Games.

[29]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[30]  Ya'akov Gal,et al.  Learning Social Preferences in Games , 2004, AAAI.

[31]  Michael H. Bowling,et al.  Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games , 2015, AAMAS.

[32]  Michèle Sebag,et al.  The grand challenge of computer Go , 2012, Commun. ACM.

[33]  Kevin Waugh,et al.  Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[34]  Nathan R. Sturtevant,et al.  Understanding the Success of Perfect Information Monte Carlo Sampling in Game Tree Search , 2010, AAAI.

[35]  Ya'akov Gal,et al.  An Adaptive Agent for Negotiating with People in Different Cultures , 2011, TIST.

[36]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[37]  Shan Suthaharan,et al.  Support Vector Machine , 2016 .