Reinforcement learning (RL) is a well-established paradigm for enabling autonomous agents to learn from experience. To enable RL to scale to any but the smallest domains, it is necessary to make use of abstraction and generalization of the state-action space, for example with a factored representation. However, to make effective use of such a representation, it is necessary to determine which state variables are relevant in which situations. In this work, we introduce T-UCT, a novel model-based RL approach for learning and exploiting the dynamics of structured hierarchical environments. When learning the dynamics while acting, a partial or inaccurate model may do more harm than good. T-UCT uses graph-based planning and Monte Carlo simulations to exploit models that may be incomplete or inaccurate, allowing it to both maximize cumulative rewards and ignore trajectories that are unlikely to succeed. T-UCT incorporates new experiences in the form of more accurate plans that span a greater area of the state space. T-UCT is fully implemented and compared empirically against B-VISA, the best known prior approach to the same problem. We show that T-UCT learns hierarchical models with fewer samples than B-VISA and that this effect is magnified at deeper levels of hierarchical complexity.
[1]
Marc Toussaint,et al.
Hierarchical Monte-Carlo Planning
,
2015,
AAAI.
[2]
Michael Kearns,et al.
Efficient Reinforcement Learning in Factored MDPs
,
1999,
IJCAI.
[3]
Csaba Szepesvári,et al.
Bandit Based Monte-Carlo Planning
,
2006,
ECML.
[4]
Andrew G. Barto,et al.
Active Learning of Dynamic Bayesian Networks in Markov Decision Processes
,
2007,
SARA.
[5]
Andrew G. Barto,et al.
Causal Graph Based Decomposition of Factored MDPs
,
2006,
J. Mach. Learn. Res..
[6]
Peter Stone,et al.
Monte Carlo Hierarchical Model Learning
,
2015,
AAMAS.
[7]
Thomas G. Dietterich.
The MAXQ Method for Hierarchical Reinforcement Learning
,
1998,
ICML.
[8]
Andrew G. Barto,et al.
Intrinsically Motivated Hierarchical Skill Learning in Structured Environments
,
2010,
IEEE Transactions on Autonomous Mental Development.