Computationally Feasible Bounds for Partially Observed Markov Decision Processes

A partially observed Markov decision process (POMDP) is a sequential decision problem where information concerning parameters of interest is incomplete, and possible actions include sampling, surveying, or otherwise collecting additional information. Such problems can theoretically be solved as dynamic programs, but the relevant state space is infinite, which inhibits algorithmic solution. This paper explains how to approximate the state space by a finite grid of points, and use that grid to construct upper and lower value function bounds, generate approximate nonstationary and stationary policies, and bound the value loss relative to optimal for using these policies in the decision problem. A numerical example illustrates the methodology.

[1]  H. Freudenthal Simplizialzerlegungen von Beschrankter Flachheit , 1942 .

[2]  James S. Kakalik,et al.  OPTIMUM POLICIES FOR PARTIALLY OBSERVABLE MARKOV SYSTEMS , 1965 .

[3]  E. Denardo CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .

[4]  J. MacQueen,et al.  Letter to the Editor - A Test for Suboptimal Actions in Markovian Decision Problems , 1967, Oper. Res..

[5]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[6]  Evan L. Porteus Some Bounds for Discounted Sequential Decision Processes , 1971 .

[7]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[8]  Herbert E. Scarf,et al.  The Computation of Economic Equilibria , 1974 .

[9]  Loren K. Platzman,et al.  Finite memory estimation and control of finite probabilistic systems , 1977 .

[10]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[12]  Daniel P. Heyman,et al.  Stochastic models in operations research , 1982 .

[13]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[14]  Robert L. Smith,et al.  Conditions for the Existence of Planning Horizons , 1984, Math. Oper. Res..

[15]  B. Eaves A Course in Triangulations for Solving Equations with Deformations , 1984 .

[16]  Robert L. Smith,et al.  A New Optimality Criterion for Nonhomogeneous Markov Decision Processes , 1987, Oper. Res..

[17]  Chelsea C. White,et al.  Solution Procedures for Partially Observed Markov Decision Processes , 1989, Oper. Res..