论文信息 - An Overview of MAXQ Hierarchical Reinforcement Learning

An Overview of MAXQ Hierarchical Reinforcement Learning

Reinforcement learning addresses the problem of learning optimal policies for sequential decision-making problems involving stochastic operators and numerical reward functions rather than the more traditional deterministic operators and logical goal predicates. In many ways, reinforcement learning research is recapitulating the development of classical research in planning and problem solving. After studying the problem of solving "flat" problem spaces, researchers have recently turned their attention to hierarchical methods that incorporate subroutines and state abstractions. This paper gives an overview of the MAXQ value function decomposition and its support for state abstraction and action abstraction.

Thomas G. Dietterich

[1] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[2] Satinder Singh. Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[3] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[4] Thomas Dean,et al. Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[5] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[6] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[7] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[9] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[10] R. Sutton. Between MDPs and Semi-MDPs : Learning , Planning , and Representing Knowledge at Multiple Temporal Scales , 1998 .

[11] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[12] Doina Precup,et al. Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales , 1998 .

[13] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..