论文信息 - Learning opening books in partially observable games: Using random seeds in Phantom Go

Learning opening books in partially observable games: Using random seeds in Phantom Go

Many artificial intelligences (AIs) are randomized. One can be lucky or unlucky with the random seed; we quantify this effect and show that, maybe contrarily to intuition, this is far from being negligible. Then, we apply two different existing algorithms for selecting good seeds and good probability distributions over seeds. This mainly leads to learning an opening book. We apply this to Phantom Go, which, as all phantom games, is hard for opening book learning. We improve the winning rate from 50% to 70% in 5×5 against the same AI, and from approximately 0% to 40% in 5×5, 7×7 and 9×9 against a stronger (learning) opponent.

[1] Shi-Jim Yen,et al. GOLOIS Wins Phantom Go Tournament , 2013, J. Int. Comput. Games Assoc..

[2] David Auger,et al. Sparse binary zero-sum games , 2014, ACML.

[3] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[4] B. Stengel,et al. COMPUTING EQUILIBRIA FOR TWO-PERSON GAMES , 1996 .

[5] Olivier Teytaud,et al. The Nash and the bandit approaches for adversarial portfolios , 2014, 2014 IEEE Conference on Computational Intelligence and Games.

[6] Jean Méhat,et al. Combining UCT and Nested Monte Carlo Search for Single-Player General Game Playing , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[7] Tristan Cazenave,et al. Nested Monte-Carlo Search , 2009, IJCAI.

[8] Olivier Teytaud,et al. The rectangular seeds of Domineering , 2015, 2015 IEEE Conference on Computational Intelligence and Games (CIG).

[9] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[10] J. Neumann,et al. Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[11] Olivier Teytaud,et al. Nash reweighting of Monte Carlo simulations: Tsumego , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[12] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[13] Bruno Bouzy,et al. Bayesian Generation and Integration of K-nearest-neighbor Patterns for 19x19 Go , 2005, CIG.

[14] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[15] Bruno Bouzy,et al. Associating domain-dependent knowledge and Monte Carlo approaches within a Go program , 2005, Inf. Sci..

[16] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[17] A E Bostwick,et al. THE THEORY OF PROBABILITIES. , 1896, Science.

[18] Guy Haworth,et al. KQQKQQ and the Kasparov-World Game , 1999, J. Int. Comput. Games Assoc..

[19] Nataliya Sokolovska,et al. A Principled Method for Exploiting Opening Books , 2010, Computers and Games.

[20] Tristan Cazenave,et al. A Phantom-Go Program , 2006, ACG.

[21] Leonid Khachiyan,et al. A sublinear-time randomized approximation algorithm for matrix games , 1995, Oper. Res. Lett..