Automated Hierarchy Discovery for Planning in Partially Observable Domains

Planning in partially observable domains is a notoriously difficult problem. However, in many real-world scenarios, planning can be simplified by decomposing the task into a hierarchy of smaller planning problems which, can then be solved independently of one another. Several approaches, mainly dealing with fully observable domains, have been proposed to optimize a plan that decomposes according to a hierarchy specified a priori. Some researchers have also proposed to discover hierarchies in fully observable domains. In this thesis, we investigate the problem of automatically discovering planning hierarchies in partially observable domains. The main advantage of discovering hierarchies is that, for a plan of a fixed size, hierarchical plans can be more expressive than nonhierarchical ones. Our solution frames the discovery and optimization of a hierarchical policy as a nonconvex optimization problem. By encoding the hierarchical structure as variables of the optimization problem, we can automatically discover a hierarchy. Successfully solving the optimization problem therefore yields an optimal hierarchy and an optimal policy. We describe several techniques to solve the optimization problem. Namely, we provide results using general non-linear solvers, mixed-integer linear and non-linear solvers or a form of bounded hierarchical policy iteration. Our method is flexible enough to allow any parts of the hierarchy to be specified based on prior knowledge while letting the optimization discover the unknown parts. It can also discover hierarchical policies, including recursive policies, that are more compact (potentially infinitely fewer parameters).

[1]  Robert Givan,et al.  Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[2]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[3]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[4]  Richard E. Korf,et al.  Planning as Search: A Quantitative Approach , 1987, Artif. Intell..

[5]  Leslie Pack Kaelbling,et al.  Spatial and Temporal Abstractions in POMDPs Applied to Robot Navigation , 2005 .

[6]  P. Poupart Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[7]  Michael A. Saunders,et al.  SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization , 2002, SIAM J. Optim..

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  I. Grossmann Review of Nonlinear Mixed-Integer and Disjunctive Programming Techniques , 2002 .

[10]  Richard S. Sutton,et al.  On the Significance of Markov Decision Processes , 1997, ICANN.

[11]  Sven Leyffer,et al.  User manual for filterSQP , 1998 .

[12]  Nicholas Kushmerick,et al.  An Algorithm for Probabilistic Planning , 1995, Artif. Intell..

[13]  Eric A. Hansen,et al.  Synthesis of Hierarchical Finite-State Controllers for POMDPs , 2003, ICAPS.

[14]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[15]  Gérard Cornuéjols,et al.  An algorithmic framework for convex mixed integer nonlinear programs , 2008, Discret. Optim..

[16]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[17]  Sridhar Mahadevan,et al.  Hierarchical Memory-Based Reinforcement Learning , 2000, NIPS.

[18]  Josh D. Tenenberg,et al.  Chapter 4 – Abstraction in Planning , 1991 .

[19]  A. Ruszczynski,et al.  Nonlinear Optimization , 2006 .

[20]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[21]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[22]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[23]  Sridhar Mahadevan,et al.  Hierarchical Policy Gradient Algorithms , 2003, ICML.

[24]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[25]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[26]  R. Bellman A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .

[27]  Joelle Pineau,et al.  Tractable planning under uncertainty: exploiting structure , 2004 .

[28]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[29]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[30]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[31]  Thomas Hofmann,et al.  Automated Hierarchy Discovery for Planning in Partially Observable Environments , 2007 .

[32]  Hanif D. Sherali,et al.  Disjunctive Programming , 2009, Encyclopedia of Optimization.

[33]  Shlomo Zilberstein,et al.  Solving POMDPs using quadratically constrained linear programs , 2006, AAMAS '06.

[34]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[35]  I. Nowak Relaxation and Decomposition Methods for Mixed Integer Nonlinear Programming , 2005 .

[36]  Sven Leyffer,et al.  Integrating SQP and Branch-and-Bound for Mixed Integer Nonlinear Programming , 2001, Comput. Optim. Appl..

[37]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[38]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[39]  A. Neumaier Complete search in continuous global optimization and constraint satisfaction , 2004, Acta Numerica.

[40]  Eric A. Hansen,et al.  An Improved Policy Iteration Algorithm for Partially Observable MDPs , 1997, NIPS.

[41]  Craig A. Knoblock,et al.  A Theory of Abstraction for Hierarchical Planning , 1990 .

[42]  George Kuttickal Chacko Operations research/management science , 1993 .

[43]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[44]  Judy Goldsmith,et al.  Nonapproximability Results for Partially Observable Markov Decision Processes , 2011, Universität Trier, Mathematik/Informatik, Forschungsbericht.

[45]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[46]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[47]  R. Saigal,et al.  Handbook of semidefinite programming : theory, algorithms, and applications , 2000 .

[48]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[49]  Alvin W Drake,et al.  Observation of a Markov process through a noisy channel , 1962 .

[50]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[51]  Joelle Pineau,et al.  A Hierarchical Approach to POMDP Planning and Execution , 2004 .