论文信息 - On optimal control of Markov chains with safety constraint

On optimal control of Markov chains with safety constraint

We study the control of completely observed Markov chains with safety bounds as introduced by Arapostathis et al (2005), but with more general safety constraints and the added requirement of optimality. The safety bounds were specified as unit-interval valued vector pairs (lower and upper bounds for each component of the state probability distribution). In this paper we generalize the constraint set to be any linear convex set and present a way to compute a stationary control policy which is safe (i.e., maintains the safety of the distribution that is initially safe) and at the same time it is long-run average optimal. We propose a linear programming formulation for computing such a safe optimal policy. Under a simplifying assumption that the optimal policy is ergodic, we present a finitely-terminating iterative algorithm to compute the maximal invariant safe set (MISS) where the initial distribution must lie so that the future distributions always remain safe. Our approach allows us to calculate an upper bound for the number of iterations needed for the algorithm to terminate. In particular, for the two-state chains we show that at most one iteration is needed to compute the MISS

[1] Ari Arapostathis,et al. On non-stationary policies and maximal invariant safe sets of controlled Markov chains , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[2] Valerie Isham,et al. Non‐Negative Matrices and Markov Chains , 1983 .

[3] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[4] Sekhar Tangirala,et al. Controlled Markov chains with safety upper bound , 2003, IEEE Trans. Autom. Control..

[5] Ari Arapostathis,et al. Control of Markov chains with safety bounds , 2005, IEEE Transactions on Automation Science and Engineering.

[6] Arie Hordijk,et al. Constrained Undiscounted Stochastic Dynamic Programming , 1984, Math. Oper. Res..

[7] Vijay K. Garg,et al. Modeling and Control of Logical Discrete Event Systems , 1994 .

[8] Keith W. Ross,et al. Markov Decision Processes with Sample Path Constraints: The Communicating Case , 1989, Oper. Res..

[9] E. Altman. Constrained Markov Decision Processes , 1999 .

[10] E. Seneta. Non-negative Matrices and Markov Chains , 2008 .

[11] Keith W. Ross,et al. Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..