论文信息 - Markov decision processes with noise-corrupted and delayed state observations

Markov decision processes with noise-corrupted and delayed state observations

We consider the partially observed Markov decision process with observations delayed by k time periods. We show that at stage t, a sufficient statistic is the probability distribution of the underlying system state at stage t - k and all actions taken from stage t - k through stage t - 1. We show that improved observation quality and/or reduced data delay will not decrease the optimal expected total discounted reward, and we explore the optimality conditions for three important special cases. We present a measure of the marginal value of receiving state observations delayed by (k - 1) stages rather than delayed by k stages. We show that in the limit as k →∞ the problem is equivalent to the completely unobserved case. We present numerical examples which illustrate the value of receiving state information delayed by k stages.

Chelsea C. White | J L Bander | C. White | J. Bander

[1] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[2] Chelsea C. White,et al. A survey of solution techniques for the partially observed Markov decision process , 1991, Ann. Oper. Res..

[3] William S. Lovejoy,et al. Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[4] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[5] Cornelius T. Leondes,et al. Technical Note - Markov Decision Processes with State-Information Lag , 1972, Oper. Res..

[6] Chelsea C. White. Note on “A Partially Observable Markov Decision Process with Lagged Information” , 1988 .

[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8] Shinhong Kim,et al. A Partially Observable Markov Decision Process with Lagged Information , 1987 .

[9] S. Kim. State information lag markov decision process with control limit rule , 1985 .

[10] C. White,et al. Application of Jensen's inequality to adaptive suboptimal design , 1980 .

[11] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[12] Leslie Pack Kaelbling,et al. Partially Observable Markov Decision Processes for Artificial Intelligence , 1995, KI.

[13] Chelsea C. White,et al. Solution Procedures for Partially Observed Markov Decision Processes , 1989, Oper. Res..

[14] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .