On optimal control of Markov chains with safety constraint

We study the control of completely observed Markov chains with safety bounds as introduced by Arapostathis et al (2005), but with more general safety constraints and the added requirement of optimality. The safety bounds were specified as unit-interval valued vector pairs (lower and upper bounds for each component of the state probability distribution). In this paper we generalize the constraint set to be any linear convex set and present a way to compute a stationary control policy which is safe (i.e., maintains the safety of the distribution that is initially safe) and at the same time it is long-run average optimal. We propose a linear programming formulation for computing such a safe optimal policy. Under a simplifying assumption that the optimal policy is ergodic, we present a finitely-terminating iterative algorithm to compute the maximal invariant safe set (MISS) where the initial distribution must lie so that the future distributions always remain safe. Our approach allows us to calculate an upper bound for the number of iterations needed for the algorithm to terminate. In particular, for the two-state chains we show that at most one iteration is needed to compute the MISS