论文信息 - Reinforcement Learning with Hierarchical Decision-Making

Reinforcement Learning with Hierarchical Decision-Making

This paper proposes a simple, hierarchical decision-making approach to reinforcement learning, under the framework of Markov decision processes. According to the approach, the choice of an action, in every time stage, is made through a successive elimination of actions and sets of actions from the underlined action-space, until a single action is decided upon. Based on the approach, the paper defines a hierarchical Q-function, and shows that this function can be the basis for an optimal policy. A hierarchical reinforcement learning algorithm is then proposed. The algorithm, which can be shown to converge to the hierarchical Q-function, provides new opportunities for state abstraction

[1] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[4] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[5] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[6] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8] Andrew G. Barto,et al. Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.

[9] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[10] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[11] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[12] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[13] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[14] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[15] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[16] R. Bellman. Dynamic programming. , 1957, Science.

[17] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..