Progressive Abstraction Refinement for Sparse Sampling

Monte Carlo tree search (MCTS) algorithms can encounter difficulties when solving Markov decision processes (MDPs) in which the outcomes of actions are highly stochastic. This stochastic branching can be reduced through state abstraction. In online planning with a time budget, there is a complex tradeoff between loss in performance due to overly coarse abstraction versus gain in performance from reducing the problem size. Coarse but unsound abstractions often outperform sound abstractions for practical budgets. Motivated by this, we propose a progressive abstraction refinement algorithm that refines an initially coarse abstraction during search in order to match the abstraction to the sample budget. Our experiments show that the algorithm combines the strong performance of coarse abstractions at small sample budgets with the ability to exploit larger budgets for further performance gains.

[1]  Thomas J. Walsh,et al.  Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[2]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[3]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[4]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[5]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[6]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[7]  Guy Van den Broeck,et al.  Automatic discretization of actions and states in Monte-Carlo tree search , 2011 .

[8]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[9]  Rémi Munos,et al.  Open Loop Optimistic Planning , 2010, COLT.

[10]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[11]  Kris K. Hauser,et al.  Randomized Belief-Space Replanning in Partially-Observable Continuous Spaces , 2010, WAFR.

[12]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[13]  Michael L. Littman,et al.  Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes , 2012, ICAPS.

[14]  Thomas G. Dietterich,et al.  State Aggregation in Monte Carlo Tree Search , 2014, AAAI.

[15]  Nan Jiang,et al.  Improving UCT planning via approximate homomorphisms , 2014, AAMAS.

[16]  S. Sanner,et al.  The Academic Advising Planning Domain , 2012 .

[17]  Michael L. Littman,et al.  Open-Loop Planning in Large-Scale Stochastic Domains , 2013, AAAI.

[18]  Alan Fern,et al.  Learning Partial Policies to Speedup MDP Tree Search , 2014, UAI.