Task-Based Decomposition of Factored POMDPs

Recently, partially observable Markov decision processes (POMDP) solvers have shown the ability to scale up significantly using domain structure, such as factored representations. In many domains, the agent is required to complete a set of independent tasks. We propose to decompose a factored POMDP into a set of restricted POMDPs over subsets of task relevant state variables. We solve each such model independently, acquiring a value function. The combination of the value functions of the restricted POMDPs is then used to form a policy for the complete POMDP. We explain the process of identifying variables that correspond to tasks, and how to create a model restricted to a single task, or to a subset of tasks. We demonstrate our approach on a number of benchmarks from the factored POMDP literature, showing that our methods are applicable to models with more than 100 state variables.

[1]  Kee-Eung Kim,et al.  Symbolic Heuristic Search Value Iteration for Factored POMDPs , 2008, AAAI.

[2]  Guy Shani,et al.  Forward Search Value Iteration for POMDPs , 2007, IJCAI.

[3]  Craig Boutilier,et al.  Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[4]  Guy Shani,et al.  Efficient ADD Operations for Point-Based Algorithms , 2008, ICAPS.

[5]  David R. Thompson,et al.  Generating Exponentially Smaller POMDP Models Using Conditionally Irrelevant Variable Abstraction , 2007, ICAPS.

[6]  Craig Boutilier,et al.  Using Abstractions for Decision-Theoretic Planning with Time Constraints , 1994, AAAI.

[7]  Ronald E. Parr,et al.  Solving Factored POMDPs with Linear Value Functions , 2001 .

[8]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[9]  Marc Toussaint,et al.  Hierarchical POMDP Controller Optimization by Likelihood Maximization , 2008, UAI.

[10]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[11]  Zhengzhu Feng,et al.  Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.

[12]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[13]  Zahra Zamani,et al.  Symbolic Dynamic Programming for Continuous State and Observation POMDPs , 2012, NIPS.

[14]  Andrew G. Barto,et al.  Causal Graph Based Decomposition of Factored MDPs , 2006, J. Mach. Learn. Res..

[15]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[16]  R. I. Bahar,et al.  Algebraic decision diagrams and their applications , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[17]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[18]  Jesse Hoey,et al.  Assisting persons with dementia during handwashing using a partially observable Markov decision process. , 2007, ICVS 2007.

[19]  Blai Bonet,et al.  Solving POMDPs: RTDP-Bel vs. Point-based Algorithms , 2009, IJCAI.

[20]  P. Poupart Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[21]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[22]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[23]  Eric A. Hansen,et al.  Synthesis of Hierarchical Finite-State Controllers for POMDPs , 2003, ICAPS.