论文信息 - Backpropagation Modification in Monte-Carlo Game Tree Search

Backpropagation Modification in Monte-Carlo Game Tree Search

The Algorithm UCT, proposed by Kocsys et al[3], which apply multi-armed bandit problem into the tree-structured search space, achieves some remarkable success in some challenging fields[2]. For UCT algorithm, Monte-Carlo simulations are performed with the guidance of UCB1 formula, which are averaged to evaluate a specified action. We observe that, as more simulations are performed, later ones usually lead to more accurate results, partly because the level of the search used in the later simulation is deeper and partly because more results are available to direct subsequent simulations. This paper presents a new method to improve the performance of UCT algorithm by increasing the feedback value of the later simulations. And the experimental results in the classical game Go show that our approach increases the performance of Monte-Carlo simulations significantly when exponential models are used.

Zhiqing Liu | Fan Xie | Zhiqing Liu | Fan Xie

[1] Brian Sheppard,et al. World-championship-caliber Scrabble , 2002, Artif. Intell..

[2] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.

[3] Martin Müller,et al. Computer Go , 2002, Artif. Intell..

[4] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[5] Jonathan Schaeffer,et al. The challenge of poker , 2002, Artif. Intell..

[6] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[7] Sylvain Gelly,et al. Exploration exploitation in Go: UCT for Monte-Carlo Go , 2006, NIPS 2006.

[8] SheppardBrian. World-championship-caliber Scrabble , 2002 .

[9] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[10] H. Jaap van den Herik,et al. Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[11] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.