UCT is a Monte-Carlo planning algorithm that, with in a given amount of time, computes near-optimal solutions for Markovian decision processes of large state spaces. It has gained much attention from there search community and been used in many applications since its publication in 2006, because of its significant improvement of the effectiveness of Monte-Carlo planning computation. This paper proposes a modification of the UCT algorithm, which can prune certain Markovian decision process actions and their associated states during the Monte-Carlo planning computation. The pruning of actions and states is performed based on properties of underlying UCB algorithms of UCT. This paper proves that it is highly unlikely for the pruned actions and states to be in the solution path returned by the UCT algorithm, making the pruning modification almost just as good as the original algorithm. Additionally, the pruning modification may reduce the size of the Markovian decision process state space, and thus improves the effectiveness of the original algorithm. Experimental results in computer GO demonstrate the effectiveness of pruning in the UCT algorithm.
[1]
F. James,et al.
Monte Carlo theory and practice
,
1980
.
[2]
Peter Auer,et al.
Finite-time Analysis of the Multiarmed Bandit Problem
,
2002,
Machine Learning.
[3]
Markus Püschel,et al.
Bandit-based optimization on graphs with application to library performance tuning
,
2009,
ICML '09.
[4]
Jos W. H. M. Uiterwijk,et al.
Monte-Carlo Tree Search in Backgammon
,
2007
.
[5]
Csaba Szepesvári,et al.
Bandit Based Monte-Carlo Planning
,
2006,
ECML.
[6]
Mark H. M. Winands,et al.
Monte-Carlo Tree Search Solver
,
2008,
Computers and Games.
[7]
Sylvain Gelly,et al.
Exploration exploitation in Go: UCT for Monte-Carlo Go
,
2006,
NIPS 2006.
[8]
Olivier Teytaud,et al.
Modification of UCT with Patterns in Monte-Carlo Go
,
2006
.
[9]
Leslie Pack Kaelbling,et al.
On the Complexity of Solving Markov Decision Problems
,
1995,
UAI.