论文信息 - Memory efficient factored abstraction for reinforcement learning

Memory efficient factored abstraction for reinforcement learning

Classical reinforcement learning techniques are often inadequate for problems with large state-space due to curse of dimensionality. If the states can be represented as a set of variables, it is possible to model the environment more compactly. Automatic detection and use of temporal abstractions during learning was proven to be effective to increase learning speed. In this paper, we propose a factored automatic temporal abstraction method based on an existing temporal abstraction strategy, namely extended sequence tree algorithm, by taking care of state differences via state variable changes. The proposed method has been shown to provide significant memory gain on selected benchmark problems.

[1] Olga Kozlova. Automated Discovery of Options in Factored Reinforcement Learning Keywords : options , temporal abstraction , factored reinforcement learning , 2009 .

[2] Amy McGovern. Autonomous Discovery of Abstractions through Interaction with an Environment , 2002, SARA.

[3] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[4] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[5] Geoffrey E. Hinton,et al. Reinforcement learning for factored Markov decision processes , 2002 .

[6] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .

[7] Andrew G. Barto,et al. Behavioral building blocks for autonomous agents: description, identification, and learning , 2008 .

[8] Bernhard Hengst,et al. Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[9] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11] Reda Alhajj,et al. Improving reinforcement learning by using sequence trees , 2010, Machine Learning.

[12] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[13] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..