Markov decision processes with delays and asynchronous cost collection

Markov decision processes (MDPs) may involve three types of delays. First, state information, rather than being available instantaneously, may arrive with a delay (observation delay). Second, an action may take effect at a later decision stage rather than immediately (action delay). Third, the cost induced by an action may be collected after a number of stages (cost delay). We de rive two results, one for constant and one for random delays, for reducing an MDP with delays to an MDP without delays, which differs only in the size of the state space. The results are based on the intuition that costs may be collected asynchronously, i.e., at a stage other than the one in which they are induced, as long as they are discounted properly.

[1]  S. Kim State information lag markov decision process with control limit rule , 1985 .

[2]  Eitan Altman,et al.  Closed-loop control with delayed information , 1992, SIGMETRICS '92/PERFORMANCE '92.

[3]  Eitan Altman,et al.  Congestion control as a stochastic control problem with action delays , 1999, Autom..

[4]  S. Marcus,et al.  Decentralized control of finite state Markov processes , 1980, 1980 19th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[5]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[6]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[7]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[8]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[11]  Shinhong Kim,et al.  A Partially Observable Markov Decision Process with Lagged Information , 1987 .

[12]  Cornelius T. Leondes,et al.  Technical Note - Markov Decision Processes with State-Information Lag , 1972, Oper. Res..

[13]  D. J. White,et al.  A Survey of Applications of Markov Decision Processes , 1993 .

[14]  J. Walrand,et al.  On delayed sharing patterns , 1978 .

[15]  Chelsea C. White Note on “A Partially Observable Markov Decision Process with Lagged Information” , 1988 .

[16]  J. Walrand,et al.  Distributed Dynamic Programming , 2022 .

[17]  Chelsea C. White,et al.  Markov decision processes with noise-corrupted and delayed state observations , 1999, J. Oper. Res. Soc..

[18]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[19]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .