论文信息 - Belief-State Monte Carlo Tree Search for Phantom Go

Belief-State Monte Carlo Tree Search for Phantom Go

Phantom Go is a derivative of Go with imperfect information. It is challenging in AI field due to its great uncertainty of the hidden information and high game complexity inherited from Go. To deal with this imperfect information game with large game tree complexity, a general search framework named belief-state Monte Carlo tree search (BS-MCTS) is put forward in this paper. BS-MCTS incorporates belief-states into Monte Carlo Tree Search, where belief-state is a notation derived from philosophy to represent the probability that speculation is in accordance with reality. In BS-MCTS, a belief-state tree, in which each node is a belief-state, is constructed and search proceeds in accordance with beliefs. Then, Opponent Guessing and Opponent Predicting are proposed to illuminate the learning mechanism of beliefs with heuristic information. The beliefs are learned by heuristic information during search by specific methods, and we propose Opponent Guessing and Opponent Predicting to illuminate the learning mechanism. Besides, some possible improvements of the framework are investigated, such as incremental updating and all moves as first (AMAF) heuristic. Technical details are demonstrated about applying BS-MCTS to Phantom Go, especially on inference strategy. We examine the playing strength of the BS-MCTS and AMAF-BS-MCTS in Phantom Go by varying search parameters, also testify the proposed improvements.

[1] Tuomas Sandholm,et al. Game theory-based opponent modeling in large imperfect-information games , 2011, AAMAS.

[2] Mark H. M. Winands,et al. Monte-Carlo Tree Search for the game of Scotland Yard , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[3] Shi-Jim Yen,et al. GOLOIS Wins Phantom Go Tournament , 2013, J. Int. Comput. Games Assoc..

[4] Richard C. Jeffrey,et al. Knowledge, Belief, and Counterfactual Reasoning in Games , 1999 .

[5] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.

[6] B. Stengel,et al. Efficient Computation of Behavior Strategies , 1996 .

[7] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[8] Piotr J. Gmytrasiewicz,et al. Particle Filtering Approximation of Kriegspiel Play with Opponent Modeling , 2009 .

[9] Matthew L. Ginsberg,et al. GIB: Imperfect Information in a Computationally Challenging Game , 2011, J. Artif. Intell. Res..

[10] Olivier Teytaud,et al. Learning opening books in partially observable games: Using random seeds in Phantom Go , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[11] Jie Xu,et al. UCT based search in phantom go , 2013, 2013 25th Chinese Control and Decision Conference (CCDC).

[12] Peter I. Cowling,et al. Emergent bluffing and inference with Monte Carlo Tree Search , 2015, 2015 IEEE Conference on Computational Intelligence and Games (CIG).

[13] Bret Hoehn,et al. Effective short-term opponent exploitation in simplified poker , 2005, Machine Learning.

[14] Hongye Li,et al. Belief-state Monte-Carlo tree search for Phantom games , 2015, 2015 IEEE Conference on Computational Intelligence and Games (CIG).

[15] Guillaume Chaslot,et al. A Comparison of Monte-Carlo Methods for Phantom Go , 2007 .

[16] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[17] Tuomas Sandholm. Solving imperfect-information games , 2015, Science.

[18] Tristan Cazenave,et al. A Phantom-Go Program , 2006, ACG.

[19] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[20] Javier Peña,et al. Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..

[21] Tristan Cazenave,et al. Nested Monte-Carlo Search , 2009, IJCAI.

[22] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[23] Antonio Del Giudice,et al. Towards Strategic Kriegspiel Play with Opponent Modeling , 2007, AAAI Spring Symposium: Game Theoretic and Decision Theoretic Agents.

[24] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[25] Tuomas Sandholm,et al. Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[26] J. Perry. The Problem of the Essential Indexical: and Other Essays , 1993 .

[27] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.

[28] Peter I. Cowling,et al. Information Set Monte Carlo Tree Search , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[29] Peter McCracken,et al. Safe Strategies for Agent Modelling in Games , 2004, AAAI Technical Report.

[30] Peter I. Cowling,et al. Ensemble Determinization in Monte Carlo Tree Search for the Imperfect Information Card Game Magic: The Gathering , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[31] T. Cazenave. Reflexive Monte-Carlo Search , 2007 .

[32] Abdallah Saffidine. MOCCOS wins the Phantom-Go Tournament , 2011 .

[33] Michael H. Bowling,et al. Online implicit agent modelling , 2013, AAMAS.

[34] K. Ravikumar,et al. Demand sensing in e-business , 2005 .

[35] Peter Stone,et al. Convergence, Targeted Optimality, and Safety in Multiagent Learning , 2010, ICML.

[36] Jonathan Schaeffer,et al. Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.

[37] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[38] David P. Helmbold,et al. All-Moves-As-First Heuristics in Monte-Carlo Go , 2009, IC-AI.

[39] Nathan R. Sturtevant,et al. Understanding the Success of Perfect Information Monte Carlo Sampling in Game Tree Search , 2010, AAAI.

[40] Kevin Waugh,et al. DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.

[41] Ian Frank,et al. Search in Games with Incomplete Information: A Case Study Using Bridge Card Play , 1998, Artif. Intell..

[42] Olivier Teytaud,et al. Lemmas on partial observation, with application to phantom games , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[43] Robert Stalnaker. Knowledge, Belief and Counterfactual Reasoning in Games , 1996, Economics and Philosophy.

[44] Tuomas Sandholm,et al. Finding equilibria in large sequential games of imperfect information , 2006, EC '06.

[45] David Silver,et al. Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[46] Peter I. Cowling,et al. Determinization and information set Monte Carlo Tree Search for the card game Dou Di Zhu , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[47] Christopher D. Rosin,et al. Multi-armed bandits with episode context , 2011, Annals of Mathematics and Artificial Intelligence.

[48] Mark H. M. Winands,et al. Sequential Halving for Partially Observable Games , 2015, CGW/GIGA@IJCAI.

[49] Alan Fern,et al. Ensemble Monte-Carlo Planning: An Empirical Study , 2011, ICAPS.

[50] Colin Camerer. Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[51] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[52] P. J. Gmytrasiewicz,et al. A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.