Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

This paper describes sufficient conditions for the existence of optimal policies for Partially Observable Markov Decision Processes (POMDPs) with Borel state, observation, and action sets and with the expected total costs. Action sets may not be compact and one-step cost functions may be unbounded. The introduced conditions are also sufficient for the validity of optimality equations, semi-continuity of value functions, and convergence of value iterations to optimal values. Since POMDPs can be reduced to Completely Observable Markov Decision Processes (COMDPs), whose states are posterior state distributions, this paper focuses on the validity of the above mentioned optimality properties for COMDPs. The central question is whether transition probabilities for a COMDP are weakly continuous. We introduce sufficient conditions for this and show that the transition probabilities for a COMDP are weakly continuous, if transition probabilities of the underlying Markov Decision Process are weakly continuous and observation probabilities for the POMDP are continuous in the total variation. Moreover, the continuity in the total variation of the observation probabilities cannot be weakened to setwise continuity. The results are illustrated with counterexamples and examples.

[1]  O. Gaans Probability measures on metric spaces , 2022 .

[2]  Anna Jaskiewicz,et al.  Zero-Sum Ergodic Stochastic Games with Feller Transition Probabilities , 2006, SIAM J. Control. Optim..

[3]  U. Rieder,et al.  Markov Decision Processes with Applications to Finance , 2011 .

[4]  Michael Z. Zgurovsky,et al.  Uniform Fatou's Lemma , 2015, 1504.01796.

[5]  Tamás Linder,et al.  Optimization and convergence of observation channels in stochastic control , 2010, Proceedings of the 2011 American Control Conference.

[6]  Eugene A. Feinberg,et al.  Average Cost Markov Decision Processes with Weakly Continuous Transition Probabilities , 2012, Math. Oper. Res..

[7]  D. Rhenius Incomplete Information in Markovian Decision Models , 1974 .

[8]  Onésimo Hernández-Lerma,et al.  Controlled Markov Processes , 1965 .

[9]  E. Dynkin Controlled Random Sequences , 1965 .

[10]  S. P. Sethi,et al.  An Incomplete Information Inventory Model with Presence of Inventories or Backorders as Only Observations , 2008 .

[11]  Eugene A. Feinberg,et al.  Optimality Inequalities for Average Cost Markov Decision Processes and the Stochastic Cash Balance Problem , 2007, Math. Oper. Res..

[12]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[13]  O. Hernández-Lerma,et al.  Discrete-time Markov control processes , 1999 .

[14]  A. Bensoussan Stochastic Control of Partially Observable Systems , 1992 .

[15]  K. Hinderer,et al.  Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[16]  Manfred Schäl,et al.  Average Optimality in Dynamic Programming with General State Space , 1993, Math. Oper. Res..

[17]  Michael Z. Zgurovsky,et al.  Convergence of probability measures and Markov decision models with incomplete information , 2014, 1407.1029.

[18]  Suresh P. Sethi,et al.  Partially Observed Inventory Systems: The Case of Zero-Balance Walk , 2007, SIAM J. Control. Optim..

[19]  M. Aoki Optimal control of partially observable Markovian systems , 1965 .

[20]  P. Billingsley,et al.  Convergence of Probability Measures , 1969 .

[21]  O. Hernández-Lerma Adaptive Markov Control Processes , 1989 .

[22]  E. Feinberg,et al.  Berge’s theorem for noncompact image sets , 2012, 1203.1340.

[23]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[24]  Onésimo Hernández-Lerma,et al.  Limiting Discounted-Cost Control of Partially Observable Stochastic Systems , 2001, SIAM J. Control. Optim..

[25]  Charlotte Striebel,et al.  Optimal Control of Discrete Time Stochastic Systems , 1975 .

[26]  U. Rieder Bayesian dynamic programming , 1975, Advances in Applied Probability.

[27]  A. Bensoussan,et al.  Filtering for Discrete-Time Markov Processes and Applications to Inventory Control with Incomplete Information , 2008 .

[28]  E. Feinberg,et al.  Fatou's Lemma for Weakly Converging Probabilities , 2012, 1206.4073.

[29]  E. Feinberg,et al.  Bergeʼs maximum theorem for noncompact image sets , 2013, 1309.7708.

[30]  K. Parthasarathy PROBABILITY MEASURES IN A METRIC SPACE , 1967 .

[31]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[32]  A. Yushkevich Reduction of a Controlled Markov Model with Incomplete Data to a Problem with Complete Information in the Case of Borel State and Control Space , 1976 .

[33]  T. Yoshikawa,et al.  Discrete-Time Markovian Decision Processes with Incomplete State Observation , 1970 .

[34]  Suresh P. Sethi,et al.  Partially Observed Inventory Systems: The Case of Rain Checks , 2008, SIAM J. Control. Optim..