论文信息 - Automated Hierarchy Discovery for Planning in Partially Observable Domains

Automated Hierarchy Discovery for Planning in Partially Observable Domains

Planning in partially observable domains is a notoriously difficult problem. However, in many real-world scenarios, planning can be simplified by decomposing the task into a hierarchy of smaller planning problems which, can then be solved independently of one another. Several approaches, mainly dealing with fully observable domains, have been proposed to optimize a plan that decomposes according to a hierarchy specified a priori. Some researchers have also proposed to discover hierarchies in fully observable domains. In this thesis, we investigate the problem of automatically discovering planning hierarchies in partially observable domains. The main advantage of discovering hierarchies is that, for a plan of a fixed size, hierarchical plans can be more expressive than nonhierarchical ones. Our solution frames the discovery and optimization of a hierarchical policy as a nonconvex optimization problem. By encoding the hierarchical structure as variables of the optimization problem, we can automatically discover a hierarchy. Successfully solving the optimization problem therefore yields an optimal hierarchy and an optimal policy. We describe several techniques to solve the optimization problem. Namely, we provide results using general non-linear solvers, mixed-integer linear and non-linear solvers or a form of bounded hierarchical policy iteration. Our method is flexible enough to allow any parts of the hierarchy to be specified based on prior knowledge while letting the optimization discover the unknown parts. It can also discover hierarchical policies, including recursive policies, that are more compact (potentially infinitely fewer parameters).

Laurent Charlin | Laurent Charlin

[1] Robert Givan,et al. Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[2] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.

[3] Thomas G. Dietterich,et al. Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[4] Richard E. Korf,et al. Planning as Search: A Quantitative Approach , 1987, Artif. Intell..

[5] Leslie Pack Kaelbling,et al. Spatial and Temporal Abstractions in POMDPs Applied to Robot Navigation , 2005 .

[6] P. Poupart. Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[7] Michael A. Saunders,et al. SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization , 2002, SIAM J. Optim..

[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9] I. Grossmann. Review of Nonlinear Mixed-Integer and Disjunctive Programming Techniques , 2002 .

[10] Richard S. Sutton,et al. On the Significance of Markov Decision Processes , 1997, ICANN.

[11] Sven Leyffer,et al. User manual for filterSQP , 1998 .