论文信息 - Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search

Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search

Regret minimization is important in both the Multi-Armed Bandit problem and Monte-Carlo Tree Search (MCTS). Recently, simple regret, i.e., the regret of not recommending the best action, has been proposed as an alternative to cumulative regret in MCTS, i.e., regret accumulated over time. Each type of regret is appropriate in different contexts. Although the majority of MCTS research applies the UCT selection policy for minimizing cumulative regret in the tree, this paper introduces a new MCTS variant, Hybrid MCTS (H-MCTS), which minimizes both types of regret in different parts of the tree. H-MCTS uses SHOT, a recursive version of Sequential Halving, to minimize simple regret near the root, and UCT to minimize cumulative regret when descending further down the tree. We discuss the motivation for this new search technique, and show the performance of H-MCTS in six distinct two-player games: Amazons, AtariGo, Ataxx, Breakthrough, NoGo, and Pentalath.

[1] Peter I. Cowling,et al. Monte Carlo Tree Search with macro-actions and heuristic route planning for the Physical Travelling Salesman Problem , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[2] Tristan Cazenave,et al. Ieee Transactions on Computational Intelligence and Ai in Games 1 Sequential Halving Applied to Trees , 2022 .

[3] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[4] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[5] Bart Selman,et al. Understanding Sampling-based Adversarial Search Methods , 2010, UAI 2010.

[6] Shang-Rong Tsai,et al. Current Frontiers in Computer Go , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[7] Mark H. M. Winands,et al. Monte-Carlo Tree Search Solver , 2008, Computers and Games.

[8] David Tolpin,et al. MCTS Based on Simple Regret , 2012, AAAI.

[9] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[10] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[11] Alan Fern,et al. UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.

[12] Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[13] Olivier Teytaud,et al. On the huge benefit of decisive moves in Monte-Carlo Tree Search algorithms , 2010, Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games.

[14] Ryan B. Hayward,et al. Monte Carlo Tree Search in Hex , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[15] Mark H. M. Winands,et al. Real-Time Monte Carlo Tree Search in Ms Pac-Man , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[16] Tom Pepels,et al. NOVEL SELECTION METHODS FOR MONTE-CARLO TREE SEARCH , 2014 .

[17] Mark H. M. Winands,et al. Monte Carlo Tree Search in Lines of Action , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[18] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[19] Carmel Domshlak,et al. Simple Regret Optimization in Online Planning for Markov Decision Processes , 2012, J. Artif. Intell. Res..

[20] Bart Selman,et al. Understanding Sampling Style Adversarial Search Methods , 2010, UAI.

[21] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.