论文信息 - Proper Policies in Infinite-State Stochastic Shortest Path Problems

Proper Policies in Infinite-State Stochastic Shortest Path Problems

We consider stochastic shortest path problems with infinite state and control spaces, a nonnegative cost per stage, and a termination state. We extend the notion of a proper policy, a policy that terminates within a finite expected number of steps, from the context of finite state space to the context of infinite state space. We consider the optimal cost function <inline-formula><tex-math notation="LaTeX">$J^*$</tex-math></inline-formula>, and the optimal cost function <inline-formula><tex-math notation="LaTeX">$\hat{J}$</tex-math></inline-formula> over just the proper policies. We show that <inline-formula><tex-math notation="LaTeX">$J^*$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$\hat{J}$</tex-math></inline-formula> are the smallest and largest solutions of Bellman's equation, respectively, within a suitable class of Lyapounov-like functions. If the cost per stage is bounded, these functions are those that are bounded over the effective domain of <inline-formula> <tex-math notation="LaTeX">$\hat{J}$</tex-math></inline-formula>. The standard value iteration algorithm may be attracted to either <inline-formula><tex-math notation="LaTeX">$J^*$</tex-math></inline-formula> or <inline-formula> <tex-math notation="LaTeX">$\hat{J}$</tex-math></inline-formula>, depending on the initial condition.

Dimitri P. Bertsekas

[1] O. Hernández-Lerma,et al. Further topics on discrete-time Markov control processes , 1999 .

[2] Dimitri P. Bertsekas,et al. Regular Policies in Abstract Dynamic Programming , 2016, SIAM J. Optim..

[3] E. J. Collins,et al. An analysis of transient Markov decision processes , 2006, Journal of Applied Probability.

[4] Vladimír Kucera,et al. The discrete Riccati equation of optimal control , 1972, Kybernetika.

[5] Dimitri P. Bertsekas,et al. Stable Optimal Control and Semicontractive Dynamic Programming , 2017, SIAM J. Control. Optim..

[6] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[7] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .

[8] Vladimír Kucera,et al. A review of the matrix Riccati equation , 1973, Kybernetika.

[9] Rolf van Dawen,et al. Negative Dynamic Programming , 1984 .

[10] Dimitri P. Bertsekas,et al. Abstract Dynamic Programming , 2013 .

[11] Dimitri P. Bertsekas,et al. A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies , 2013, Math. Oper. Res..