论文信息 - Accelerating Action Dependent Hierarchical Reinforcement Learning Through Autonomous Subgoal Discovery

Accelerating Action Dependent Hierarchical Reinforcement Learning Through Autonomous Subgoal Discovery

This paper presents a new method for the autonomous construction of hierarchical action and state representations in reinforcement learning, aimed at accelerating learning and extending the scope of such systems. In this approach, the agent uses information acquired while learning one task to discover subgoals for similar tasks by analyzing the learned policy using Monte Carlo sampling. The agent is able to transfer this knowledge to subsequent tasks and to accelerate learning by creating corresponding subtask policies as abstract actions (options). At the same time, the subgoal actions are used to construct a more abstract state representation using action-dependent state space partitioning, adding a new level to the state space hierarchy. This level serves as the initial representation for new learning tasks. In order to ensure that tasks are learnable, value functions are built simultaneously at different levels of hierarchy and inconsistencies are used to identify actions to be used to refine relevant portions of the abstract state space.

M. Huber | Mehran Asadi

[1] Pattie Maes,et al. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .

[2] Chris Drummond. Using a Case Base of Surfaces to Speed-Up Reinforcement Learning , 1997, ICCBR.

[3] R. Sutton. Between MDPs and Semi-MDPs : Learning , Planning , and Representing Knowledge at Multiple Temporal Scales , 1998 .

[4] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[5] IT Kee-EungKim. Solving Factored MDPs Using Non-homogeneous Partitions , 1998 .

[6] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[7] Thomas G. Dietterich. An Overview of MAXQ Hierarchical Reinforcement Learning , 2000, SARA.

[8] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[9] Manfred Huber,et al. Subgoal Discovery for Hierarchical Reinforcement Learning Using Learned Policies , 2003 .

[10] Manfred Huber,et al. State Space Reduction For Hierarchical Reinforcement Learning , 2004, FLAIRS.