On Combinatorial Actions and CMABs with Linear Side Information

Online planning algorithms are typically a tool of choice for dealing with sequential decision problems in combinatorial search spaces. Many such problems, however, also exhibit combinatorial actions, yet standard planning algorithms do not cope well with this type of "the curse of dimensionality." Following a recently opened line of related work on combinatorial multi-armed bandit (CMAB) problems, we propose a novel CMAB planning scheme, as well as two specific instances of this scheme, dedicated to exploiting what is called linear side information. Using a representative strategy game as a benchmark, we show that the resulting algorithms very favorably compete with the state-of-the-art.

[1]  Yi Gai,et al.  Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[2]  Malte Helmert,et al.  Trial-Based Heuristic Tree Search for Finite Horizon MDPs , 2013, ICAPS.

[3]  Rémi Munos,et al.  Open Loop Optimistic Planning , 2010, COLT.

[4]  Santiago Ontañón,et al.  The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games , 2013, AIIDE.

[5]  Michael Buro,et al.  Fast Heuristic Search for RTS Game Combat Scenarios , 2012, AIIDE.

[6]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[7]  Wei Chen,et al.  Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.

[8]  Alan Fern,et al.  UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Michael Buro,et al.  Heuristic Search Applied to Abstract Combat Games , 2005, Canadian Conference on AI.

[11]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[12]  Rémi Munos,et al.  Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[15]  Michael Buro,et al.  Alpha-Beta Pruning for Games with Simultaneous Moves , 2012, AAAI.

[16]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[17]  Jonathan Schaeffer,et al.  Monte Carlo Planning in RTS Games , 2005, CIG.

[18]  Carmel Domshlak,et al.  Monte-Carlo Planning: Theoretically Fast Convergence Meets Practical Efficiency , 2013, UAI.

[19]  Carmel Domshlak,et al.  On MABs and Separation of Concerns in Monte-Carlo Planning for MDPs , 2014, ICAPS.