Hierarchical Monte-Carlo Planning

Monte-Carlo Tree Search, especially UCT and its POMDP version POMCP, have demonstrated excellent performance on many problems. However, to efficiently scale to large domains one should also exploit hierarchical structure if present. In such hierarchical domains, finding rewarded states typically requires to search deeply; covering enough such informative states very far from the root becomes computationally expensive in flat non-hierarchical search approaches. We propose novel, scalable MCTS methods which integrate a task hierarchy into the MCTS framework, specifically leading to hierarchical versions of both, UCT and POMCP. The new method does not need to estimate probabilistic models of each subtask, it instead computes subtask policies purely sample-based. We evaluate the hierarchical MCTS methods on various settings such as a hierarchical MDP, a Bayesian model-based hierarchical RL problem, and a large hierarchical POMDP.

[1]  Chelsea C. White,et al.  Procedures for the Solution of a Finite-Horizon, Partially Observed, Semi-Markov Optimization Problem , 1976, Oper. Res..

[2]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[3]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[4]  Leslie Pack Kaelbling,et al.  Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.

[5]  Eric A. Hansen,et al.  Synthesis of Hierarchical Finite-State Controllers for POMDPs , 2003, ICAPS.

[6]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7]  Joelle Pineau,et al.  Tractable planning under uncertainty: exploiting structure , 2004 .

[8]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[9]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[10]  Yngvi Björnsson,et al.  Simulation-Based Approach to General Game Playing , 2008, AAAI.

[11]  Richard J. Lorentz Amazons Discover Monte-Carlo , 2008, Computers and Games.

[12]  Marc Toussaint,et al.  Hierarchical POMDP Controller Optimization by Likelihood Maximization , 2008, UAI.

[13]  Nicholas Roy,et al.  PUMA: Planning Under Uncertainty with Macro-Actions , 2010, AAAI.

[14]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[15]  David Hsu,et al.  Monte Carlo Value Iteration with Macro-Actions , 2011, NIPS.

[16]  David Silver,et al.  Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[17]  Viet-Hung Dang,et al.  Monte-Carlo tree search for Bayesian reinforcement learning , 2012, 2012 11th International Conference on Machine Learning and Applications.

[18]  Feng Cao,et al.  Bayesian Hierarchical Reinforcement Learning , 2012, NIPS.

[19]  Peter Dayan,et al.  Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.

[20]  Feng Wu,et al.  Online planning for large MDPs with MAXQ decomposition , 2012, AAMAS.

[21]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[22]  Peter I. Cowling,et al.  Monte Carlo Tree Search with macro-actions and heuristic route planning for the Multiobjective Physical Travelling Salesman Problem , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[23]  Wolfgang Ertel,et al.  Monte carlo bayesian hierarchical reinforcement learning , 2014, AAMAS.

[24]  Sungyoung Lee,et al.  Approximate planning for bayesian hierarchical reinforcement learning , 2014, Applied Intelligence.

[25]  Joelle Pineau,et al.  Information Gathering and Reward Exploitation of Subgoals for POMDPs , 2015, AAAI.