Proper Policies in Infinite-State Stochastic Shortest Path Problems

We consider stochastic shortest path problems with infinite state and control spaces, a nonnegative cost per stage, and a termination state. We extend the notion of a proper policy, a policy that terminates within a finite expected number of steps, from the context of finite state space to the context of infinite state space. We consider the optimal cost function <inline-formula><tex-math notation="LaTeX">$J^*$</tex-math></inline-formula>, and the optimal cost function <inline-formula><tex-math notation="LaTeX">$\hat{J}$</tex-math></inline-formula> over just the proper policies. We show that <inline-formula><tex-math notation="LaTeX">$J^*$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$\hat{J}$</tex-math></inline-formula> are the smallest and largest solutions of Bellman's equation, respectively, within a suitable class of Lyapounov-like functions. If the cost per stage is bounded, these functions are those that are bounded over the effective domain of <inline-formula> <tex-math notation="LaTeX">$\hat{J}$</tex-math></inline-formula>. The standard value iteration algorithm may be attracted to either <inline-formula><tex-math notation="LaTeX">$J^*$</tex-math></inline-formula> or <inline-formula> <tex-math notation="LaTeX">$\hat{J}$</tex-math></inline-formula>, depending on the initial condition.

[1]  O. Hernández-Lerma,et al.  Further topics on discrete-time Markov control processes , 1999 .

[2]  Dimitri P. Bertsekas,et al.  Regular Policies in Abstract Dynamic Programming , 2016, SIAM J. Optim..

[3]  E. J. Collins,et al.  An analysis of transient Markov decision processes , 2006, Journal of Applied Probability.

[4]  Vladimír Kucera,et al.  The discrete Riccati equation of optimal control , 1972, Kybernetika.

[5]  Dimitri P. Bertsekas,et al.  Stable Optimal Control and Semicontractive Dynamic Programming , 2017, SIAM J. Control. Optim..

[6]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[7]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[8]  Vladimír Kucera,et al.  A review of the matrix Riccati equation , 1973, Kybernetika.

[9]  Rolf van Dawen,et al.  Negative Dynamic Programming , 1984 .

[10]  Dimitri P. Bertsekas,et al.  Abstract Dynamic Programming , 2013 .

[11]  Dimitri P. Bertsekas,et al.  A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies , 2013, Math. Oper. Res..

[12]  Stanley R. Pliska ON THE TRANSIENT CASE FOR MARKOV DECISION CHAINS WITH GENERAL STATE SPACES , 1978 .

[13]  E. Altman Constrained Markov Decision Processes , 1999 .

[14]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[15]  T. Mori,et al.  On the discrete Riccati equation , 1987 .

[16]  J. Willems Least squares stationary optimal control and the algebraic Riccati equation , 1971 .

[18]  Peter Whittle,et al.  Optimization Over Time , 1982 .

[19]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[20]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[21]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[22]  Dimitri P. Bertsekas,et al.  Stochastic Shortest Path Problems Under Weak Conditions , 2013 .

[23]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[24]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .