论文信息 - State Aggregation in Monte Carlo Tree Search

State Aggregation in Monte Carlo Tree Search

Monte Carlo tree search (MCTS) algorithms are a popular approach to online decision-making in Markov decision processes (MDPs). These algorithms can, however, perform poorly in MDPs with high stochastic branching factors. In this paper, we study state aggregation as a way of reducing stochastic branching in tree search. Prior work has studied formal properties of MDP state aggregation in the context of dynamic programming and reinforcement learning, but little attention has been paid to state aggregation in MCTS. Our main result is a performance loss bound for a class of value function-based state aggregation criteria in expectimax search trees. We also consider how to construct MCTS algorithms that operate in the abstract state space but require a simulator of the ground dynamics only. We find that trajectory sampling algorithms like UCT can be adapted easily, but that sparse sampling algorithms present difficulties. As a proof of concept, we experimentally confirm that state aggregation can improve the finite-sample performance of UCT.

Thomas G. Dietterich | Alan Fern | Jesse Hostetler | Alan Fern | Jesse Hostetler

[1] Carmel Domshlak,et al. Friends or Foes? An AI Planning Perspective on Abstraction and Search , 2006, ICAPS.

[2] Thomas J. Walsh,et al. Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[3] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[4] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[5] Alan Fern,et al. UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.

[6] Benjamin Van Roy. Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..

[7] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[8] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.

[9] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.

[10] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[11] Nan Jiang,et al. Improving UCT planning via approximate homomorphisms , 2014, AAMAS.

[12] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..