论文信息 - Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstraction

Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstraction

In this paper, we consider planning in stochastic shortest path (SSP) problems, a subclass of Markov Decision Problems (MDP). We focus on medium-size problems whose state space can be fully enumerated. This problem has numerous important applications, such as navigation and planning under uncertainty. We propose a new approach for constructing a multi-level hierarchy of progressively simpler abstractions of the original problem. Once computed, the hierarchy can be used to speed up planning by first finding a policy for the most abstract level and then recursively refining it into a solution to the original problem. This approach is fully automated and delivers a speed-up of two orders of magnitude over a state-of-the-art MDP solver on sample problems while returning near-optimal solutions. We also prove theoretical bounds on the loss of solution optimality resulting from the use of abstractions.

[1] Eleanor Clark,et al. Baldur's Gate , 1970 .

[2] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[3] Robert Givan,et al. Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[4] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[5] IT Kee-EungKim. Solving Factored MDPs Using Non-homogeneous Partitions , 1998 .

[6] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[7] N. Carpenter,et al. WarCraft III: reign of chaos , 2003, SVR '03.

[8] Manfred Huber,et al. State Space Reduction For Hierarchical Reinforcement Learning , 2004, FLAIRS.

[9] Geoffrey J. Gordon,et al. Fast Exact Planning in Markov Decision Processes , 2005, ICAPS.

[10] Nathan R. Sturtevant,et al. Memory-Efficient Abstractions for Pathfinding , 2007, AIIDE.

[11] Manfred Huber,et al. Effective Control Knowledge Transfer through Learning Skill and Representation Hierarchies , 2007, IJCAI.

[12] Jonathan Schaeffer,et al. Dynamic Control in Path-Planning with Real-Time Heuristic Search , 2007, ICAPS.

[13] Nathan R. Sturtevant,et al. Graph Abstraction in Real-time Heuristic Search , 2007, J. Artif. Intell. Res..

[14] Jonathan Schaeffer,et al. Dynamic Control in Real-Time Heuristic Search , 2008, J. Artif. Intell. Res..