论文信息 - Automatic Discovery of Subgoals in Reinforcement Learning Using Unique-Dreiction Value

Automatic Discovery of Subgoals in Reinforcement Learning Using Unique-Dreiction Value

Option has proven useful in discovering hierarchical structure in reinforcement learning to fasten learning. The key problem of automatic option discovery is to find subgoals. Though approaches based on visiting-frequency have gained much research focuses, many of them fail to distinguish subgoals from their nearby states. Based on the action-restricted property of subgoals we find, subgoals can be regarded as the most matching action-restricted states in the paths. For the grid-world environment, the concept of unique-direction value embodying the action-restricted property is introduced to find the most matching action-restricted states. Experiment results prove that the proposed approach can find subgoals correctly and the Q-learning with options found speed up the learning greatly.

[1] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[2] Bruce L. Digney,et al. Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[3] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[4] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[5] Shie Mannor,et al. Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[6] Magnus Borga,et al. Hierarchical Reinforcement Learning , 1993 .

[7] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.

[8] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[9] Carla E. Brodley,et al. Proceedings of the twenty-first international conference on Machine learning , 2004, International Conference on Machine Learning.

[10] Alicia P. Wolfe,et al. Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[11] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.