论文信息 - Solving Factored POMDPs with Linear Value Functions

Solving Factored POMDPs with Linear Value Functions

Partially Observable Markov Decision Processes (POMDPs) provide a coherent mathematical framework for planning under uncertainty when the state of the system cannot be fully observed. However, the problem of finding an exact POMDP solution is intractable. Computing such solution requires the manipulation of a piecewise linear convex value function, which specifies a value for each possible belief state. This value function can be represented by a set of vectors, each one with dimension equal to the size of the state space. In nontrivial problems, however, these vectors are too large for such a representation to be feasible, preventing the use of exact POMDP algorithms. We propose an approximation scheme where each vector is represented as a linear combination of basis functions to provide a compact approximation to the value function. We also show that this representation can be exploited to allow for efficient computations in approximate value and policy iteration algorithms in the context of factored POMDPs, where the transition model is specified using a dynamic Bayesian network.

Ronald E. Parr | D. Koller | Carlos Guestrin

[1] S. A. Sherman,et al. Providence , 1906 .

[2] L. Tippett,et al. Applied Statistics. A Journal of the Royal Statistical Society , 1952 .

[3] R. Bellman,et al. Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1963 .

[4] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[5] Umberto Bertelè,et al. Nonserial Dynamic Programming , 1972 .

[6] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[7] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[8] Douglas C. Johnson. Colorado , 1982, America Votes 34: 2019-2020 Election Returns by State.

[9] G. Alexits. Approximation theory , 1983 .

[10] David J. Spiegelhalter,et al. Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[11] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .