论文信息 - Nonuniqueness versus Uniqueness of Optimal Policies in Convex Discounted Markov Decision Processes

Nonuniqueness versus Uniqueness of Optimal Policies in Convex Discounted Markov Decision Processes

From the classical point of view, it is important to determine if in a Markov decision process (MDP), besides their existence, the uniqueness of the optimal policies is guaranteed. It is well known that uniqueness does not always hold in optimization problems (for instance, in linear programming). On the other hand, in such problems it is possible for a slight perturbation of the functional cost to restore the uniqueness. In this paper, it is proved that the value functions of an MDP and its cost perturbed version stay close, under adequate conditions, which in some sense is a priority. We are interested in the stability of Markov decision processes with respect to the perturbations of the cost-as-you-go function.

Raúl Montes-de-Oca | Enrique Lemus-Rodríguez | Francisco Salem-Silva

[1] J Jaap Wessels,et al. Note---A Note on Dynamic Programming with Unbounded Rewards , 1975 .

[2] J. Hadamard. Sur les problemes aux derive espartielles et leur signification physique , 1902 .

[3] W. Marsden. I and J , 2012 .

[4] O. Hernández-Lerma,et al. Further topics on discrete-time Markov control processes , 1999 .

[5] A. Peressini,et al. The Mathematics Of Nonlinear Programming , 1988 .

[6] Raúl Montes-de-Oca,et al. Conditions for the uniqueness of optimal policies of discounted Markov decision processes , 2004, Math. Methods Oper. Res..

[7] Kensuke Tanaka,et al. ON AN $ \varepsilon $-OPTIMAL POLICY OF DISCRETE TIME STOCHASTIC CONTROL PROCESSES , 1995 .