The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games

Game tree search in games with large branching factors is a notoriously hard problem. In this paper, we address this problem with a new sampling strategy for Monte Carlo Tree Search (MCTS) algorithms, called Naive Sampling, based on a variant of the Multi-armed Bandit problem called the Combinatorial Multi-armed Bandit (CMAB) problem. We present a new MCTS algorithm based on Naive Sampling called NaiveMCTS, and evaluate it in the context of real-time strategy (RTS) games. Our results show that as the branching factor grows, NaiveMCTS performs significantly better than other algorithms.

[1]  Michael Buro,et al.  Real-Time Strategy Games: A New AI Research Challenge , 2003, IJCAI.

[2]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[3]  Michael Buro,et al.  Heuristic Search Applied to Abstract Combat Games , 2005, Canadian Conference on AI.

[4]  Jonathan Schaeffer,et al.  Monte Carlo Planning in RTS Games , 2005, CIG.

[5]  David W. Aha,et al.  Learning to Win: Case-Based Plan Selection in a Real-Time Strategy Game , 2005, Künstliche Intell..

[6]  Sylvain Gelly,et al.  Exploration exploitation in Go: UCT for Monte-Carlo Go , 2006, NIPS 2006.

[7]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[8]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[9]  Michael Buro,et al.  Adversarial Planning Through Strategy Simulation , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[10]  Csaba Szepesvári,et al.  Online Optimization in X-Armed Bandits , 2008, NIPS.

[11]  Alan Fern,et al.  UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.

[12]  Yi Gai,et al.  Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[13]  Santiago Ontañón,et al.  ON‐LINE CASE‐BASED PLANNING , 2010, Comput. Intell..

[14]  Michael Buro,et al.  Build Order Optimization in StarCraft , 2011, AIIDE.

[15]  Arnav Jhala,et al.  A Particle Model for State Estimation in Real-Time Strategy Games , 2011, AIIDE.

[16]  Gabriel Synnaeve,et al.  A Bayesian model for RTS units control applied to StarCraft , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[17]  Simon M. Lucas,et al.  Fast Approximate Max-n Monte Carlo Tree Search for Ms Pac-Man , 2011, IEEE Transactions on Computational Intelligence and AI in Games.

[18]  Michael Buro,et al.  Fast Heuristic Search for RTS Game Combat Scenarios , 2012, AIIDE.

[19]  Michael Buro,et al.  Alpha-Beta Pruning for Games with Simultaneous Moves , 2012, AAAI.

[20]  Santiago Ontañón,et al.  Kiting in RTS Games Using Influence Maps , 2012, Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.

[21]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[22]  Héctor Muñoz-Avila,et al.  CLASSQ-L: A Q-Learning Algorithm for Adversarial Real-Time Strategy Games , 2012, Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.