Threshold probability of non-terminal type in finite horizon Markov decision processes

Abstract We consider a class of problems concerned with maximizing probabilities, given stage-wise targets, which generalizes the standard threshold probability problem in Markov decision processes. The objective function is the probability that, at all stages, the associatively combined accumulation of rewards earned up to that point takes its value in a specified stage-wise interval. It is shown that this class reduces to the case of the nonnegative-valued multiplicative criterion through an invariant imbedding technique. We derive a recursive formula for the optimal value function and an effective method for obtaining the optimal policies.

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  D. White Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .

[3]  Yoshio Ohtsubo,et al.  Optimal threshold probability in undiscounted Markov decision processes with a target set , 2004, Appl. Math. Comput..

[4]  H. Simon,et al.  Models of Man. , 1957 .

[5]  D. Krass,et al.  Percentile performance criteria for limiting average Markov decision processes , 1995, IEEE Trans. Autom. Control..

[6]  Congbin Wu,et al.  Minimizing risk models in Markov decision processes with policies depending on target values , 1999 .

[7]  Toshiharu Fujita,et al.  Stochastic optimization of multiplicative functions with negative value , 1998 .

[8]  Jerzy A. Filar,et al.  Stochastic target hitting time and the problem of early retirement , 2004, IEEE Transactions on Automatic Control.

[9]  Stella X. Yu,et al.  Optimization Models for the First Arrival Target Distribution Function in Discrete Time , 1998 .

[10]  D. White Minimizing a Threshold Probability in Discounted Markov Decision Processes , 1993 .

[11]  M. Bouakiz,et al.  Target-level criterion in Markov decision processes , 1995 .

[12]  Yoshio Ohtsubo,et al.  Optimal policy for minimizing risk models in Markov decision processes , 2002 .

[13]  Jerzy A. Filar,et al.  Time Consistent Dynamic Risk Measures , 2006, Math. Methods Oper. Res..

[14]  Seiichi Iwamoto Associative Dynamic Programs , 1996 .

[15]  M. J. Sobel The variance of discounted Markov decision processes , 1982 .