Sample-Based Tree Search with Fixed and Adaptive State Abstractions

Sample-based tree search (SBTS) is an approach to solving Markov decision problems based on constructing a lookahead search tree using random samples from a generative model of the MDP. It encompasses Monte Carlo tree search (MCTS) algorithms like UCT as well as algorithms such as sparse sampling. SBTS is well-suited to solving MDPs with large state spaces due to the relative insensitivity of SBTS algorithms to the size of the state space. The limiting factor in the performance of SBTS tends to be the exponential dependence of sample complexity on the depth of the search tree. The number of samples required to build a search tree is O((|A|B)d), where |A| is the number of available actions, B is the number of possible random outcomes of taking an action, and d is the depth of the tree. State abstraction can be used to reduce B by aggregating random outcomes together into abstract states. Recent work has shown that abstract tree search often performs substantially better than tree search conducted in the ground state space. This paper presents a theoretical and empirical evaluation of tree search with both fixed and adaptive state abstractions. We derive a bound on regret due to state abstraction in tree search that decomposes abstraction error into three components arising from properties of the abstraction and the search algorithm. We describe versions of popular SBTS algorithms that use fixed state abstractions, and we introduce the Progressive Abstraction Refinement in Sparse Sampling (PARSS) algorithm, which adapts its abstraction during search. We evaluate PARSS as well as sparse sampling with fixed abstractions on 12 experimental problems, and find that PARSS outperforms search with a fixed abstraction and that search with even highly inaccurate fixed abstractions outperforms search without abstraction. These results establish progressive abstraction refinement as a promising basis for new tree search algorithms, and we propose directions for future work within the progressive refinement framework.

[1]  Thomas J. Walsh,et al.  Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[2]  Larry S. Davis,et al.  Pattern Databases , 1979, Data Base Design Techniques II.

[3]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[4]  Alan Fern,et al.  UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.

[5]  Dana S. Nau,et al.  SHOP2: An HTN Planning System , 2003, J. Artif. Intell. Res..

[6]  Benjamin Van Roy Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..

[7]  Richard E. Korf,et al.  Linear-Space Best-First Search , 1993, Artif. Intell..

[8]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[9]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[10]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[11]  Jesse Hostetler Monte Carlo Tree Search with Fixed and Adaptive Abstractions , 2017 .

[12]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[13]  Rémi Munos,et al.  Open Loop Optimistic Planning , 2010, COLT.

[14]  Borja Calvo,et al.  scmamp: Statistical Comparison of Multiple Algorithms in Multiple Problems , 2016, R J..

[15]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[16]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[17]  Thomas G. Dietterich,et al.  Progressive Abstraction Refinement for Sparse Sampling , 2015, UAI.

[18]  Alan Fern,et al.  On Adversarial Policy Switching with Experiments in Real-Time Strategy Games , 2013, ICAPS.

[19]  Kee-Eung Kim,et al.  Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[20]  Bruno Scherrer,et al.  Approximate Dynamic Programming Finally Performs Well in the Game of Tetris , 2013, NIPS.

[21]  Trevor I. Dix,et al.  Proximity-Based Non-uniform Abstractions for Approximate Planning , 2014, J. Artif. Intell. Res..

[22]  Dimitri P. Bertsekas,et al.  Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .

[23]  Marcus Hutter,et al.  Extreme State Aggregation beyond MDPs , 2014, ALT.

[24]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[25]  Guy Van den Broeck,et al.  Automatic discretization of actions and states in Monte-Carlo tree search , 2011 .

[26]  Michael L. Littman,et al.  Open-Loop Planning in Large-Scale Stochastic Domains , 2013, AAAI.

[27]  Alan Fern,et al.  Learning Partial Policies to Speedup MDP Tree Search , 2014, UAI.

[28]  Stuart J. Russell,et al.  Markovian State and Action Abstractions for MDPs via Hierarchical MCTS , 2016, IJCAI.

[29]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[30]  James A. Hendler,et al.  HTN Planning: Complexity and Expressivity , 1994, AAAI.

[31]  Robert Givan,et al.  Parallel Rollout for Online Solution of Partially Observable Markov Decision Processes , 2004, Discret. Event Dyn. Syst..

[32]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[33]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[34]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[35]  S. Ioffe,et al.  Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .

[36]  Balaraman Ravindran Approximate Homomorphisms : A framework for non-exact minimization in Markov Decision Processes , 2022 .

[37]  Nan Jiang,et al.  Improving UCT planning via approximate homomorphisms , 2014, AAMAS.

[38]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[39]  Kris K. Hauser,et al.  Randomized Belief-Space Replanning in Partially-Observable Continuous Spaces , 2010, WAFR.

[40]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[41]  S. Edelkamp Planning with Pattern Databases , 2014 .

[42]  Parag Singla,et al.  OGA-UCT: On-the-Go Abstractions in UCT , 2016, ICAPS.

[43]  Thomas G. Dietterich,et al.  State Aggregation in Monte Carlo Tree Search , 2014, AAAI.

[44]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[45]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[46]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[47]  Nicholas Mattei,et al.  The Academic Advising Planning Domain , 2012 .

[48]  D.A. Castanon,et al.  Rollout Algorithms for Stochastic Scheduling Problems , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[49]  Michael L. Littman,et al.  Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes , 2012, ICAPS.

[50]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[51]  Yishay Mansour,et al.  Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[52]  Geoffrey J. Gordon,et al.  Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.

[53]  Binns,et al.  Development of more realistic sailing simulator , 2002 .

[54]  Patrik Haslum,et al.  Flexible Abstraction Heuristics for Optimal Sequential Planning , 2007, ICAPS.

[55]  Thomas Keller,et al.  PROST: Probabilistic Planning Based on UCT , 2012, ICAPS.

[56]  Parag Singla,et al.  ASAP-UCT: Abstraction of State-Action Pairs in UCT , 2015, IJCAI.

[57]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.