论文信息 - Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs

Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs

If you have planned to achieve one particular goal in a stochastic delayed rewards problem and then someone asks about a different goal what should you do? What if you need to be ready to quickly supply an answer for any possible goal? This paper shows that by using a new kind of automata caily generated abstract action hierarchy that with N states, preparing for all of N possible goals can be much much cheaper than N times the work of preparing for one goal. In goal-based Markov Decision Problems, it is usual to generate a policy π(x) mapping states to actions, and a value function J(x) mapping states to an estimate of minimum expected cost-to-goal, starting at x. In this paper we will use the terminology that a multipolicy π*(x, y) (for all state-pairs (x, y)) maps a state x to the first action it should take in order to reach y with expected minimum cost and a multi-valuefunction J*(x, y) is a definition of this minimum cost. Building these objects quickly and with little memory is the main purpose of this paper, but a secondary result is a natural, automatic, way to create a set of parsimonious yet powerful abstractactions for MDPs. The paper concludes with a set of empirical results on increasingly large MDPs.

[1] Editors , 1986, Brain Research Bulletin.

[2] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[3] Richard P. Lippmann,et al. Proceedings of the 1997 conference on Advances in neural information processing systems 10 , 1990 .

[4] Satinder P. Singh,et al. Transfer of Learning Across Compositions of Sequentail Tasks , 1991, ML.

[5] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[6] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[7] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[8] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[9] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[10] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[11] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[12] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..