Markov decision processes with noise-corrupted and delayed state observations

We consider the partially observed Markov decision process with observations delayed by k time periods. We show that at stage t, a sufficient statistic is the probability distribution of the underlying system state at stage t - k and all actions taken from stage t - k through stage t - 1. We show that improved observation quality and/or reduced data delay will not decrease the optimal expected total discounted reward, and we explore the optimality conditions for three important special cases. We present a measure of the marginal value of receiving state observations delayed by (k - 1) stages rather than delayed by k stages. We show that in the limit as k →∞ the problem is equivalent to the completely unobserved case. We present numerical examples which illustrate the value of receiving state information delayed by k stages.

[1]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[2]  Chelsea C. White,et al.  A survey of solution techniques for the partially observed Markov decision process , 1991, Ann. Oper. Res..

[3]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[4]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[5]  Cornelius T. Leondes,et al.  Technical Note - Markov Decision Processes with State-Information Lag , 1972, Oper. Res..

[6]  Chelsea C. White Note on “A Partially Observable Markov Decision Process with Lagged Information” , 1988 .

[7]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8]  Shinhong Kim,et al.  A Partially Observable Markov Decision Process with Lagged Information , 1987 .

[9]  S. Kim State information lag markov decision process with control limit rule , 1985 .

[10]  C. White,et al.  Application of Jensen's inequality to adaptive suboptimal design , 1980 .

[11]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[12]  Leslie Pack Kaelbling,et al.  Partially Observable Markov Decision Processes for Artificial Intelligence , 1995, KI.

[13]  Chelsea C. White,et al.  Solution Procedures for Partially Observed Markov Decision Processes , 1989, Oper. Res..

[14]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .