Improving UCT planning via approximate homomorphisms

In this paper we show how abstractions can help UCT's performance. Ideal abstractions are homomorphisms because they preserve optimal policies, but they rarely exist, and are computationally hard to find even when they do. We show how a combination of (i) finding local abstractions in the layered-DAG MDP induced by a set of UCT trajectories (rather than finding abstractions in the global MDP), and (ii) accepting approximate homomorphisms, leads to greater prevalence of good abstractions and makes them computationally easier to find. We propose an algorithm for finding abstractions in UCT planning and derive a lower bound on its performance. We show empirically that it improves performance on illustrative tasks, and on the game of Othello.

[1]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[2]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[3]  Balaraman Ravindran,et al.  Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.

[4]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[5]  Yishay Mansour,et al.  Approximate Equivalence of Markov Decision Processes , 2003, COLT.

[6]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[7]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[8]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[9]  A. Barto,et al.  An algebraic approach to abstraction in reinforcement learning , 2004 .

[10]  Doina Precup,et al.  Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.

[11]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[12]  Philip Hingston,et al.  Experiments with Monte Carlo Othello , 2007, 2007 IEEE Congress on Evolutionary Computation.

[13]  Sylvain Gelly,et al.  Modifications of UCT and sequence-like simulations for Monte-Carlo Go , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[14]  Alan Fern,et al.  Lower Bounding Klondike Solitaire with Monte-Carlo Planning , 2009, ICAPS.

[15]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[16]  Nataliya Sokolovska,et al.  Continuous Upper Confidence Trees , 2011, LION.

[17]  David Silver,et al.  Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[18]  Peter Stone,et al.  TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.

[19]  Tuomas Sandholm,et al.  Lossy stochastic game abstraction with bounds , 2012, EC '12.

[20]  Balaraman Ravindran Approximate Homomorphisms : A framework for non-exact minimization in Markov Decision Processes , 2022 .