How Does the Value Function of a Markov Decision Process Depend on the Transition Probabilities?

The present work deals with the comparison of discrete time Markov decision processes MDPs, which differ only in their transition probabilities. We show that the optimal value function of an MDP is monotone with respect to appropriately defined stochastic order relations. We also find conditions for continuity with respect to suitable probability metrics. The results are applied to some well-known examples, including inventory control and optimal stopping.

[1]  Alfred Müller,et al.  Optimal selection from distributions with unknown parameters: Robustness of Bayesian models , 1996, Math. Methods Oper. Res..

[2]  Jr. Arthur F. Veinott On the Opimality of $( {s,S} )$ Inventory Policies: New Conditions and a New Proof , 1966 .

[3]  Thomas S. Ferguson,et al.  Who Solved the Secretary Problem , 1989 .

[4]  Ward Whitt,et al.  Comparison methods for queues and other stochastic models , 1986 .

[5]  J. Wessels Markov programming by successive approximations by respect to weighted supremum norms , 1976, Advances in Applied Probability.

[6]  Onésimo Hernández-Lerma,et al.  Controlled Markov Processes , 1965 .

[7]  R. M. Dudley,et al.  Real Analysis and Probability , 1989 .

[8]  A. Müller Stochastic Orders Generated by Integrals: a Unified Study , 1997, Advances in Applied Probability.

[9]  Rudi Zagst,et al.  Monotonicity and bounds for convex stochastic control models , 1994, Math. Methods Oper. Res..

[10]  V. M. Zolotarev,et al.  Addendum: Probability Metrics , 1984 .

[11]  Steven A. Lippman,et al.  Job search in a dynamic economy , 1976 .

[12]  Ward Whitt,et al.  Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..

[13]  Evan L. Porteus On the Optimality of Structured Policies in Countable Stage Decision Processes , 1975 .

[14]  Albert W. Marshall,et al.  Multivariate stochastic orderings and generating cones of functions , 1991 .

[15]  M. Schäl On the Optimality of $(s,S)$-Policies in Dynamic Inventory Models with Finite Horizon , 1976 .

[16]  Moshe Shaked,et al.  Stochastic orders and their applications , 1994 .

[17]  Jim Freeman Probability Metrics and the Stability of Stochastic Models , 1991 .

[18]  Hans-Joachim Langen,et al.  Convergence of Dynamic Programming Models , 1981, Math. Oper. Res..

[19]  K. Hinderer ON APPROXIMATE SOLUTIONS OF FINITE-STAGE DYNAMIC PROGRAMS , 1978 .

[20]  Onésimo Hernández Lerma,et al.  Monotone approximations for convex stochastic control problems , 1992 .

[21]  Daniel P. Heyman,et al.  Stochastic models in operations research , 1982 .

[22]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[23]  David Siegmund,et al.  Great expectations: The theory of optimal stopping , 1971 .

[24]  Svetlozar T. Rachev,et al.  Approximation of sums by compound Poisson distributions with respect to stop-loss distances , 1990 .

[25]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.