Self-Adaptive Monte Carlo Tree Search in General Game Playing

Many enhancements for Monte Carlo tree search (MCTS) have been applied successfully in general game playing (GGP). MCTS and its enhancements are controlled by multiple parameters that require extensive and time-consuming offline optimization. Moreover, as the played games are unknown in advance, offline optimization cannot tune parameters specifically for single games. This paper proposes a self-adaptive MCTS strategy (SA-MCTS) that integrates within the search a method to automatically tune search-control parameters online per game. It presents five different allocation strategies that decide how to allocate available samples to evaluate parameter values. Experiments with $\boldsymbol {1}$ s play-clock on multiplayer games show that for all the allocation strategies the performance of SA-MCTS that tunes two parameters is at least equal to or better than the performance of MCTS tuned offline and not optimized per-game. The allocation strategy that performs the best is N-Tuple Bandit Evolutionary Algorithm (NTBEA). This strategy also achieves a good performance when tuning four parameters. SA-MCTS can be considered as a successful strategy for domains that require parameter tuning for every single problem, and it is also a valid alternative for domains where offline parameter tuning is costly or infeasible.

[1]  T. Anthony Marsland,et al.  Learning extension parameters in game-tree search , 2003, Inf. Sci..

[2]  Michel Gendreau,et al.  Hyper-heuristics: a survey of the state of the art , 2013, J. Oper. Res. Soc..

[3]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[4]  Yngvi Björnsson,et al.  Learning Simulation Control in General Game-Playing Agents , 2010, AAAI.

[5]  Chiara F. Sironi,et al.  Comparison of rapid action value estimation variants for general game playing , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[6]  Daniel A. Ashlock,et al.  Evolutionary computation for modeling and optimization , 2005 .

[7]  Bernd Brügmann Max-Planck Monte Carlo Go , 1993 .

[8]  Risto Miikkulainen,et al.  General Video Game Playing , 2013, Artificial and Computational Intelligence in Games.

[9]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[10]  Simon M. Lucas,et al.  The N-Tuple Bandit Evolutionary Algorithm for Game Agent Optimisation , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[11]  Julian Togelius,et al.  General Video Game AI: A Multitrack Framework for Evaluating Agents, Games, and Content Generation Algorithms , 2018, IEEE Transactions on Games.

[12]  Julian Togelius,et al.  Evolving Game Skill-Depth using General Video Game AI agents , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[13]  Julian Togelius,et al.  Hyper-heuristic general video game playing , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[14]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[15]  H. Jaap van den Herik,et al.  Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[16]  Simon M. Lucas,et al.  The N-Tuple bandit evolutionary algorithm for automatic game improvement , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[17]  Maciej Swiechowski,et al.  Self-Adaptation of Playing Strategies in General Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[18]  Chiara F. Sironi,et al.  On-Line Parameter Tuning for Monte-Carlo Tree Search in General Game Playing , 2017, CGW@IJCAI.

[19]  Bernd Bischl,et al.  ASlib: A benchmark library for algorithm selection , 2015, Artif. Intell..

[20]  Carmel Domshlak,et al.  On Combinatorial Actions and CMABs with Linear Side Information , 2014, ECAI.

[21]  H. Jaap van den Herik,et al.  Cross-Entropy for Monte-Carlo Tree Search , 2008, J. Int. Comput. Games Assoc..

[22]  Santiago Ontañón,et al.  The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games , 2013, AIIDE.

[23]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[24]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[25]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[26]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[27]  Tristan Cazenave,et al.  Generalized Rapid Action Value Estimation , 2015, IJCAI.

[28]  Santiago Ontañón,et al.  Combinatorial Multi-armed Bandits for Real-Time Strategy Games , 2017, J. Artif. Intell. Res..

[29]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[30]  Simon M. Lucas,et al.  Self-adaptive MCTS for General Video Game Playing , 2018, EvoApplications.

[31]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[32]  Csaba Szepesvári,et al.  RSPSA: Enhanced Parameter Optimization in Games , 2006, ACG.