论文信息 - How Does the Value Function of a Markov Decision Process Depend on the Transition Probabilities?

How Does the Value Function of a Markov Decision Process Depend on the Transition Probabilities?

The present work deals with the comparison of discrete time Markov decision processes MDPs, which differ only in their transition probabilities. We show that the optimal value function of an MDP is monotone with respect to appropriately defined stochastic order relations. We also find conditions for continuity with respect to suitable probability metrics. The results are applied to some well-known examples, including inventory control and optimal stopping.

Alfred Müller | A. Müller

[1] Alfred Müller,et al. Optimal selection from distributions with unknown parameters: Robustness of Bayesian models , 1996, Math. Methods Oper. Res..

[2] Jr. Arthur F. Veinott. On the Opimality of $( {s,S} )$ Inventory Policies: New Conditions and a New Proof , 1966 .

[3] Thomas S. Ferguson,et al. Who Solved the Secretary Problem , 1989 .

[4] Ward Whitt,et al. Comparison methods for queues and other stochastic models , 1986 .

[5] J. Wessels. Markov programming by successive approximations by respect to weighted supremum norms , 1976, Advances in Applied Probability.

[6] Onésimo Hernández-Lerma,et al. Controlled Markov Processes , 1965 .

[7] R. M. Dudley,et al. Real Analysis and Probability , 1989 .

[8] A. Müller. Stochastic Orders Generated by Integrals: a Unified Study , 1997, Advances in Applied Probability.

[9] Rudi Zagst,et al. Monotonicity and bounds for convex stochastic control models , 1994, Math. Methods Oper. Res..

[10] V. M. Zolotarev,et al. Addendum: Probability Metrics , 1984 .

[11] Steven A. Lippman,et al. Job search in a dynamic economy , 1976 .