论文信息 - A memory-based reinforcement learning algorithm for partially observable Markovian decision processes

A memory-based reinforcement learning algorithm for partially observable Markovian decision processes

This paper presents a modified version of U-tree (A.K. McCallum, 1996), a memory-based reinforcement learning (RL) algorithm that uses selective perception and short-term memory to handle partially observable Markovian decision processes (POMDP). Conventional RL algorithms rely on a set of pre-defined states to model the environment, even though it can learn the state transitions from experience. U-tree is not only able to do that, it can also build the state model by itself based on raw sensor inputs. This paper enhances U-Treepsilas model generation process. The paper also shows that because of the simplified and yet effective state model generated by U-tree, it is feasible and preferable to adopt the classical dynamic programming (DP) algorithm for average reward MDP to solve some difficult POMDP problems. The new U-tree is tested using a car-driving task with 31,224 world states, with the agent having very limited sensory information and little knowledge about the dynamics of the environment.

Lei Zheng | Hiok Chai Quek | Siu-Yeung Cho | Siu-Yeung Cho | Lei Zheng

[1] R. Bellman. Dynamic programming. , 1957, Science.

[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3] William H. Press,et al. Numerical recipes in C , 2002 .

[4] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[5] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[8] Maja J. Matarić,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .