On controlled Markov chains with optimality requirement and safety constraint

We study the control of completely observed Markov chains subject to generalized safety bounds and optimality requirement. Originally, the safety bounds were specified as unit-interval valued vector pairs (lower and upper bounds for each component of the state probability distribution). In this paper, we generalize the constraint to be any linear convex set for the distribution to stay in, and present a way to compute a stationary control policy which is safe and at the same time long-run average optimal. This policy guarantees the safety of the system as it is on its ’limiting status’, and is derived through a linear programming formulation with its feasibility problem explored. To assure the safety of the system’s transient behavior under the policy assumed to induce a unique limiting distribution in the interior of the constraint set, we present a finitely-terminating iterative algorithm to compute the maximal invariant safe set (MISS) such that starting from which any initial distribution incurs a sequence of future distributions that are safe also. A theoretic upper bound for the number of iterations is provided. Furthermore, a simplified algorithm that might require less calculation is also introduced and illustrated in numerical examples. In particular, we obtain the closed-form representation for the MISS of two-state system based on at most one iteration of the algorithm.

[1]  Keith W. Ross,et al.  Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..

[2]  P. Ramadge,et al.  Supervisory control of a class of discrete event processes , 1987 .

[3]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4]  Arie Hordijk,et al.  Constrained Undiscounted Stochastic Dynamic Programming , 1984, Math. Oper. Res..

[5]  Sekhar Tangirala,et al.  Controlled Markov chains with safety upper bound , 2003, IEEE Trans. Autom. Control..

[6]  A. Hordijk,et al.  Linear Programming and Markov Decision Chains , 1979 .

[7]  Weihai Zhang,et al.  FINITE-TIME CONTROL OF LINEAR STOCHASTIC SYSTEMS , 2008 .

[8]  Vijay K. Garg,et al.  Modeling and Control of Logical Discrete Event Systems , 1994 .

[9]  E. Seneta Non-negative Matrices and Markov Chains , 2008 .

[10]  Ari Arapostathis,et al.  Control of Markov chains with safety bounds , 2005, IEEE Transactions on Automation Science and Engineering.

[11]  Keith W. Ross,et al.  Markov Decision Processes with Sample Path Constraints: The Communicating Case , 1989, Oper. Res..

[12]  Ali Esmaili,et al.  Probability and Random Processes , 2005, Technometrics.

[13]  Valerie Isham,et al.  Non‐Negative Matrices and Markov Chains , 1983 .

[14]  E. Altman Constrained Markov Decision Processes , 1999 .

[15]  Ari Arapostathis,et al.  On non-stationary policies and maximal invariant safe sets of controlled Markov chains , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[16]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .