论文信息 - Automated State Abstraction for Options using the U-Tree Algorithm

Automated State Abstraction for Options using the U-Tree Algorithm

Learning a complex task can be significantly facilitated by defining a hierarchy of subtasks. An agent can learn to choose between various temporally abstract actions, each solving an assigned subtask, to accomplish the overall task. In this paper, we study hierarchical learning using the framework of options. We argue that to take full advantage of hierarchical structure, one should perform option-specific state abstraction, and that if this is to scale to larger tasks, state abstraction should be automated. We adapt McCallum's U-Tree algorithm to automatically build option-specific representations of the state feature space, and we illustrate the resulting algorithm using a simple hierarchical task. Results suggest that automated option-specific state abstraction is an attractive approach to making hierarchical learning systems more effective.

Andrew G. Barto | Anders Jonsson | A. Barto | Anders Jonsson

[1] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[2] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .

[3] B. L. Digney. Learning of hierarchical control structures , 1996, Proceedings of the 1996 IEEE International Symposium on Intelligent Control.

[4] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[5] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[6] LearningWilliam,et al. Generalizing Adversarial Reinforcement , 1997 .

[7] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[8] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[9] Thomas G. Dietterich. State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.

[10] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..