A survey of algorithmic methods for partially observed Markov decision processes

A partially observed Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. The significant applied potential for such processes remains largely unrealized, due to an historical lack of tractable solution methodologies. This paper reviews some of the current algorithmic alternatives for solving discrete-time, finite POMDPs over both finite and infinite horizons. The major impediment to exact solution is that, even with a finite set of internal system states, the set of possible information states is uncountably infinite. Finite algorithms are theoretically available for exact solution of the finite horizon problem, but these are computationally intractable for even modest-sized problems. Several approximation methodologies are reviewed that have the potential to generate computationally feasible, high precision solutions.

[1]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[2]  D. Blackwell Discounted Dynamic Programming , 1965 .

[3]  James S. Kakalik,et al.  OPTIMUM POLICIES FOR PARTIALLY OBSERVABLE MARKOV SYSTEMS , 1965 .

[4]  M. Aoki Optimal control of partially observable Markovian systems , 1965 .

[5]  J. MacQueen,et al.  Letter to the Editor - A Test for Suboptimal Actions in Markovian Decision Problems , 1967, Oper. Res..

[6]  M. Degroot Optimal Statistical Decisions , 1970 .

[7]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[8]  S. Ross Quality Control under Markovian Deterioration , 1971 .

[9]  Ronald A. Howard,et al.  Dynamic Probabilistic Systems , 1971 .

[10]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[11]  Donald B. Rosenfield,et al.  Markovian Deterioration with Uncertain Information , 1976, Oper. Res..

[12]  Loren K. Platzman,et al.  Finite memory estimation and control of finite probabilistic systems , 1977 .

[13]  T. Morton,et al.  Discounting, Ergodicity and Convergence for Markov Decision Processes , 1977 .

[14]  P. Schweitzer Contraction mappings underlying undiscounted Markov decision problems—II , 1978 .

[15]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[16]  K. Sawaki,et al.  OPTIMAL CONTROL FOR PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES OVER AN INFINITE HORIZON , 1978 .

[17]  C. White Optimal control-limit strategies for a partially observed replacement problem† , 1979 .

[18]  C. White Monotone control laws for noisy, countable-state Markov chains , 1980 .

[19]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[20]  Robert L. Smith,et al.  Conditions for the Existence of Planning Horizons , 1984, Math. Oper. Res..

[21]  B. Eaves A Course in Triangulations for Solving Equations with Deformations , 1984 .

[22]  James N. Eagle The Optimal Search for a Moving Target When the Search Path Is Constrained , 1984, Oper. Res..

[23]  William S. Lovejoy,et al.  Some Monotonicity Results for Partially Observed Markov Decision Processes , 1987, Oper. Res..

[24]  Chelsea C. White,et al.  Solution Procedures for Partially Observed Markov Decision Processes , 1989, Oper. Res..

[25]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..