论文信息 - Hierarchical Policy Gradient Algorithms

Hierarchical Policy Gradient Algorithms

Hierarchical reinforcement learning is a general framework which attempts to accelerate policy learning in large domains. On the other hand, policy gradient reinforcement learning (PGRL) methods have received recent attention as a means to solve problems with continuous state spaces. However, they suffer from slow convergence. In this paper, we combine these two approaches and propose a family of hierarchical policy gradient algorithms for problems with continuous state and/or action spaces. We also introduce a class of hierarchical hybrid algorithms, in which a group of subtasks, usually at the higher-levels of the hierarchy, are formulated as value function-based RL (VFRL) problems and the others as PGRL problems. We demonstrate the performance of our proposed algorithms using a simple taxi-fuel problem and a complex continuous state and action ship steering domain.

Sridhar Mahadevan | Mohammad Ghavamzadeh | M. Ghavamzadeh | S. Mahadevan

[1] P. Marbach. Simulation-Based Methods for Markov Decision Processes , 1998 .

[2] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[3] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[4] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[5] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[6] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[7] Sridhar Mahadevan,et al. Continuous-Time Hierarchical Reinforcement Learning , 2001, ICML.

[8] Jun Morimoto,et al. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[9] Sridhar Mahadevan,et al. Hierarchically Optimal Average Reward Reinforcement Learning , 2002, ICML.