Solving Factored POMDPs with Linear Value Functions

Partially Observable Markov Decision Processes (POMDPs) provide a coherent mathematical framework for planning under uncertainty when the state of the system cannot be fully observed. However, the problem of finding an exact POMDP solution is intractable. Computing such solution requires the manipulation of a piecewise linear convex value function, which specifies a value for each possible belief state. This value function can be represented by a set of vectors, each one with dimension equal to the size of the state space. In nontrivial problems, however, these vectors are too large for such a representation to be feasible, preventing the use of exact POMDP algorithms. We propose an approximation scheme where each vector is represented as a linear combination of basis functions to provide a compact approximation to the value function. We also show that this representation can be exploited to allow for efficient computations in approximate value and policy iteration algorithms in the context of factored POMDPs, where the transition model is specified using a dynamic Bayesian network.

[1]  S. A. Sherman,et al.  Providence , 1906 .

[2]  L. Tippett,et al.  Applied Statistics. A Journal of the Royal Statistical Society , 1952 .

[3]  R. Bellman,et al.  Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1963 .

[4]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[5]  Umberto Bertelè,et al.  Nonserial Dynamic Programming , 1972 .

[6]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[7]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[8]  Douglas C. Johnson Colorado , 1982, America Votes 34: 2019-2020 Election Returns by State.

[9]  G. Alexits Approximation theory , 1983 .

[10]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[11]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[12]  Chelsea C. White,et al.  A survey of solution techniques for the partially observed Markov decision process , 1991, Ann. Oper. Res..

[13]  Craig Boutilier,et al.  Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[14]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[15]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[16]  Shlomo Zilberstein,et al.  Finite-memory control of partially observable systems , 1998 .

[17]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[18]  Rina Dechter,et al.  Bucket Elimination: A Unifying Framework for Reasoning , 1999, Artif. Intell..

[19]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[20]  Daphne Koller,et al.  Policy Iteration for Factored MDPs , 2000, UAI.

[21]  Claus Skaanning Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence : UAI'00 , 2000 .

[22]  Zhengzhu Feng,et al.  Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.

[23]  Carlos Guestrin,et al.  Max-norm Projections for Factored MDPs , 2001, IJCAI.