Trade-Offs in Sampling-Based Adversarial Planning

The Upper Confidence bounds for Trees (UCT) algorithm has in recent years captured the attention of the planning and game-playing community due to its notable success in the game of Go. However, attempts to reproduce similar levels of performance in domains that are the forte of Minimax-style algorithms have been largely unsuccessful, making any comparative studies of the two hard. In this paper, we study UCT in the game of Mancala, which to our knowledge is the first domain where both search algorithms perform quite well with minimal enhancement. We focus on the three key components of the UCT algorithm in its purest form — targeted node expansion, state value estimation via playouts and averaging backups — and look at their contributions to the overall performance of the algorithm. We study the trade-offs involved in using alternate ways to perform these steps. Finally, we demonstrate a novel hybrid approach to enhancing UCT, that exploits its superior decision accuracy in regions of the search space with few terminal nodes.

[1]  Bart Selman,et al.  Understanding Sampling Style Adversarial Search Methods , 2010, UAI.

[2]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[3]  David Silver,et al.  Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Achieving Master Level Play in 9 × 9 Computer Go , 2022 .

[4]  Judea Pearl,et al.  On the Nature of Pathology in Game Searching , 1983, Artif. Intell..

[5]  Ryan B. Hayward,et al.  Monte Carlo Tree Search in Hex , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[6]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000 .

[7]  Bart Selman,et al.  On Adversarial Search Spaces and Sampling-Based Planning , 2010, ICAPS.

[8]  Ivan Bratko,et al.  When is it better not to look ahead? , 2010, Artif. Intell..

[9]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[10]  Richard J. Lorentz Amazons Discover Monte-Carlo , 2008, Computers and Games.

[11]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[12]  Alan Fern,et al.  UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.

[13]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[14]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[15]  Yngvi Björnsson,et al.  Simulation-Based Approach to General Game Playing , 2008, AAAI.

[16]  Dana S. Nau,et al.  An Investigation of the Causes of Pathology in Games , 1982, Artif. Intell..

[17]  Mark H. M. Winands,et al.  Evaluation Function Based Monte-Carlo LOA , 2009, ACG.