论文信息 - The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games

The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games

Game tree search in games with large branching factors is a notoriously hard problem. In this paper, we address this problem with a new sampling strategy for Monte Carlo Tree Search (MCTS) algorithms, called Naive Sampling, based on a variant of the Multi-armed Bandit problem called the Combinatorial Multi-armed Bandit (CMAB) problem. We present a new MCTS algorithm based on Naive Sampling called NaiveMCTS, and evaluate it in the context of real-time strategy (RTS) games. Our results show that as the branching factor grows, NaiveMCTS performs significantly better than other algorithms.

Santiago Ontañón | Santiago Ontañón

[1] Michael Buro,et al. Real-Time Strategy Games: A New AI Research Challenge , 2003, IJCAI.

[2] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[3] Michael Buro,et al. Heuristic Search Applied to Abstract Combat Games , 2005, Canadian Conference on AI.

[4] Jonathan Schaeffer,et al. Monte Carlo Planning in RTS Games , 2005, CIG.

[5] David W. Aha,et al. Learning to Win: Case-Based Plan Selection in a Real-Time Strategy Game , 2005, Künstliche Intell..

[6] Sylvain Gelly,et al. Exploration exploitation in Go: UCT for Monte-Carlo Go , 2006, NIPS 2006.

[7] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[8] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.

[9] Michael Buro,et al. Adversarial Planning Through Strategy Simulation , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[10] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.

[11] Alan Fern,et al. UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.

[12] Yi Gai,et al. Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[13] Santiago Ontañón,et al. ON‐LINE CASE‐BASED PLANNING , 2010, Comput. Intell..

[14] Michael Buro,et al. Build Order Optimization in StarCraft , 2011, AIIDE.

[15] Arnav Jhala,et al. A Particle Model for State Estimation in Real-Time Strategy Games , 2011, AIIDE.

[16] Gabriel Synnaeve,et al. A Bayesian model for RTS units control applied to StarCraft , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[17] Simon M. Lucas,et al. Fast Approximate Max-n Monte Carlo Tree Search for Ms Pac-Man , 2011, IEEE Transactions on Computational Intelligence and AI in Games.

[18] Michael Buro,et al. Fast Heuristic Search for RTS Game Combat Scenarios , 2012, AIIDE.

[19] Michael Buro,et al. Alpha-Beta Pruning for Games with Simultaneous Moves , 2012, AAAI.

[20] Santiago Ontañón,et al. Kiting in RTS Games Using Influence Maps , 2012, Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.

[21] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[22] Héctor Muñoz-Avila,et al. CLASSQ-L: A Q-Learning Algorithm for Adversarial Real-Time Strategy Games , 2012, Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.