论文信息 - TTree: Tree-Based State Generalization with Temporally Abstract Actions

TTree: Tree-Based State Generalization with Temporally Abstract Actions

In this chapter we describe the Trajectory Tree, or TTree, algorithm. TTree uses a small set of supplied policies to help solve a Semi-Markov Decision Problem (SMDP). The algorithm uses a learned tree based discretization of the state space as an abstract state description and both user supplied and auto-generated policies as temporally abstract actions. It uses a generative model of the world to sample the transition function for the abstract SMDP defined by those state and temporal abstractions, and then finds a policy for that abstract SMDP. This policy for the abstract SMDP can then be mapped back to a policy for the base SMDP, solving the supplied problem. In this chapter we present the TTree algorithm and give empirical comparisons to other SMDP algorithms showing its effectiveness.

Manuela M. Veloso | William T. B. Uther

[1] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[2] Amy McGovern. Autonomous Discovery of Abstractions through Interaction with an Environment , 2002, SARA.

[3] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[4] Craig A. Knoblock. Automatically generating abstractions for problem solving , 1991 .

[5] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[6] Andrew W. Moore,et al. Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.

[7] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[8] Manuela Veloso,et al. Tree based hierarchical reinforcement learning , 2002 .

[9] Andrew W. Moore,et al. Direct Policy Search using Paired Statistical Tests , 2001, ICML.

[10] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[11] Manuela M. Veloso,et al. Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[12] Bernhard Hengst,et al. Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[13] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[14] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[15] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[16] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.