Controlled Markov Processes With Safety State Constraints

This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms.

[1]  Moshe Haviv,et al.  On constrained Markov decision processes , 1996, Oper. Res. Lett..

[2]  Behçet Açikmese,et al.  Decentralized probabilistic density control of autonomous swarms with safety constraints , 2015, Auton. Robots.

[3]  R. Kumar,et al.  On optimal control of Markov chains with safety constraint , 2006, 2006 American Control Conference.

[4]  Christel Baier,et al.  Principles of Model Checking (Representation and Mind Series) , 2008 .

[5]  Calin Belta,et al.  Temporal Logic Motion Planning and Control With Probabilistic Satisfaction Guarantees , 2012, IEEE Transactions on Robotics.

[6]  Behçet Açikmese,et al.  Convex synthesis of randomized policies for controlled Markov chains with density safety upper bound constraints , 2016, 2016 American Control Conference (ACC).

[7]  Sekhar Tangirala,et al.  Controlled Markov chains with safety upper bound , 2003, IEEE Trans. Autom. Control..

[8]  Eitan Altman,et al.  Applications of Markov Decision Processes in Communication Networks , 2000 .

[9]  Seyedshams Feyzabadi,et al.  Risk-aware path planning using hirerachical constrained Markov Decision Processes , 2014, 2014 IEEE International Conference on Automation Science and Engineering (CASE).

[10]  Ufuk Topcu,et al.  Robust control of uncertain Markov Decision Processes with temporal logic specifications , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[11]  Marco Pavone,et al.  Chance-constrained dynamic programming with application to risk-aware robotic space exploration , 2015, Autonomous Robots.

[12]  Marta Z. Kwiatkowska,et al.  Probabilistic model checking in practice: case studies with PRISM , 2005, PERV.

[13]  Calin Belta,et al.  Formal Methods for Control of Traffic Flow: Automated Control Synthesis from Finite-State Transition Models , 2017, IEEE Control Systems.

[14]  Masahiro Ono,et al.  Mixed-strategy chance constrained optimal control , 2013, 2013 American Control Conference.

[15]  Behçet Açikmese,et al.  Decision-Making Policies for Heterogeneous Autonomous Multi-Agent Systems with Safety Constraints , 2016, IJCAI.

[16]  David S. Bayard,et al.  Markov Chain Approach to Probabilistic Guidance for Swarms of Autonomous Agents , 2015 .

[17]  Sonia Martínez,et al.  Distributed Control for Spatial Self-Organization of Multi-agent Swarms , 2017, SIAM J. Control. Optim..

[18]  Calin Belta,et al.  Optimal Control of Markov Decision Processes With Linear Temporal Logic Constraints , 2014, IEEE Transactions on Automatic Control.

[19]  Ari Arapostathis,et al.  Control of Markov chains with safety bounds , 2005, IEEE Transactions on Automation Science and Engineering.

[20]  E. Altman Constrained Markov Decision Processes , 1999 .

[21]  Eugene A. Feinberg,et al.  Constrained Markov Decision Models with Weighted Discounted Rewards , 1995, Math. Oper. Res..

[22]  Ella M. Atkins,et al.  A Constrained Markov Decision Process for Flight Safety Assessment and Management , 2015 .

[23]  E. Altman,et al.  Markov decision problems and state-action frequencies , 1991 .

[24]  Behçet Açıkmeşe,et al.  Safe Markov Chains for ON/OFF Density Control With Observed Transitions , 2018, IEEE Transactions on Automatic Control.

[25]  Joost-Pieter Katoen,et al.  Approximate Model Checking of Stochastic Hybrid Systems , 2010, Eur. J. Control.

[26]  Giovanni Neglia,et al.  Newton's method for constrained norm minimization and its application to weighted graph problems , 2014, 2014 American Control Conference.

[27]  Behçet Açikmese,et al.  Convex Necessary and Sufficient Conditions for Density Safety Constraints in Markov Chain Synthesis , 2015, IEEE Transactions on Automatic Control.

[28]  F. Bullo,et al.  Motion Coordination with Distributed Information , 2007 .

[29]  A. Shwartz,et al.  Handbook of Markov decision processes : methods and applications , 2002 .