Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstraction

In this paper, we consider planning in stochastic shortest path (SSP) problems, a subclass of Markov Decision Problems (MDP). We focus on medium-size problems whose state space can be fully enumerated. This problem has numerous important applications, such as navigation and planning under uncertainty. We propose a new approach for constructing a multi-level hierarchy of progressively simpler abstractions of the original problem. Once computed, the hierarchy can be used to speed up planning by first finding a policy for the most abstract level and then recursively refining it into a solution to the original problem. This approach is fully automated and delivers a speed-up of two orders of magnitude over a state-of-the-art MDP solver on sample problems while returning near-optimal solutions. We also prove theoretical bounds on the loss of solution optimality resulting from the use of abstractions.

[1]  Eleanor Clark,et al.  Baldur's Gate , 1970 .

[2]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[3]  Robert Givan,et al.  Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[4]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[5]  IT Kee-EungKim Solving Factored MDPs Using Non-homogeneous Partitions , 1998 .

[6]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[7]  N. Carpenter,et al.  WarCraft III: reign of chaos , 2003, SVR '03.

[8]  Manfred Huber,et al.  State Space Reduction For Hierarchical Reinforcement Learning , 2004, FLAIRS.

[9]  Geoffrey J. Gordon,et al.  Fast Exact Planning in Markov Decision Processes , 2005, ICAPS.

[10]  Nathan R. Sturtevant,et al.  Memory-Efficient Abstractions for Pathfinding , 2007, AIIDE.

[11]  Manfred Huber,et al.  Effective Control Knowledge Transfer through Learning Skill and Representation Hierarchies , 2007, IJCAI.

[12]  Jonathan Schaeffer,et al.  Dynamic Control in Path-Planning with Real-Time Heuristic Search , 2007, ICAPS.

[13]  Nathan R. Sturtevant,et al.  Graph Abstraction in Real-time Heuristic Search , 2007, J. Artif. Intell. Res..

[14]  Jonathan Schaeffer,et al.  Dynamic Control in Real-Time Heuristic Search , 2008, J. Artif. Intell. Res..