Reinforcement Learning with Hierarchical Decision-Making

This paper proposes a simple, hierarchical decision-making approach to reinforcement learning, under the framework of Markov decision processes. According to the approach, the choice of an action, in every time stage, is made through a successive elimination of actions and sets of actions from the underlined action-space, until a single action is decided upon. Based on the approach, the paper defines a hierarchical Q-function, and shows that this function can be the basis for an optimal policy. A hierarchical reinforcement learning algorithm is then proposed. The algorithm, which can be shown to converge to the hierarchical Q-function, provides new opportunities for state abstraction

[1]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[2]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[4]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[5]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[6]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Andrew G. Barto,et al.  Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.

[9]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[10]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[11]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[12]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[13]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[14]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[15]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[16]  R. Bellman Dynamic programming. , 1957, Science.

[17]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..