Risk-averse control of Markov decision processes with ω-regular objectives

Many control problems in environments that can be modeled as Markov decision processes (MDPs) concern infinite-time horizon specifications. The classical aim in this context is to compute a control policy that maximizes the probability of satisfying the specification. In many scenarios, there is however a non-zero probability of failure in every step of the system's execution. For infinite-time horizon specifications, this implies that the specification is violated with probability 1 in the long run no matter what policy is chosen, which prevents previous policy computation methods from being useful in these scenarios. In this paper, we introduce a new optimization criterion for MDP policies that captures the task of working towards the satisfaction of some infinite-time horizon ω-regular specification. The new criterion is applicable to MDPs in which the violation of the specification cannot be avoided in the long run. We give an algorithm to compute policies that are optimal in this criterion and show that it captures the ideas of optimism and risk-averseness in MDP control: while the computed policies are optimistic in that a MDP run enters a failure state relatively late, they are risk-averse by always maximizing the probability to reach their respective next goal state. We give results on two robot control scenarios to validate the usability of risk-averse MDP policies.

[1]  Calin Belta,et al.  LTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees , 2011, ArXiv.

[2]  Anne Condon,et al.  The Complexity of Stochastic Games , 1992, Inf. Comput..

[3]  Krishnendu Chatterjee,et al.  A survey of stochastic ω-regular games , 2012, J. Comput. Syst. Sci..

[4]  Thomas A. Henzinger,et al.  Concurrent omega-regular games , 2000, Proceedings Fifteenth Annual IEEE Symposium on Logic in Computer Science (Cat. No.99CB36332).

[5]  Calin Belta,et al.  Optimal control of MDPs with temporal logic constraints , 2013, 52nd IEEE Conference on Decision and Control.

[6]  Thierry Siméon,et al.  The Stochastic Motion Roadmap: A Sampling Framework for Planning with Markov Motion Uncertainty , 2007, Robotics: Science and Systems.

[7]  Rupak Majumdar,et al.  Quantitative solution of omega-regular games , 2004, J. Comput. Syst. Sci..

[8]  Calin Belta,et al.  Temporal Logic Motion Planning and Control With Probabilistic Satisfaction Guarantees , 2012, IEEE Transactions on Robotics.

[9]  Krishnendu Chatterjee,et al.  The complexity of quantitative concurrent parity games , 2006, SODA '06.

[10]  Krishnendu Chatterjee,et al.  The Complexity of Stochastic Rabin and Streett Games' , 2005, ICALP.

[11]  Thomas A. Henzinger,et al.  Concurrent reachability games , 2007, Theor. Comput. Sci..

[12]  Louis Wehenkel,et al.  Risk-aware decision making and dynamic programming , 2008 .

[13]  Leslie Pack Kaelbling,et al.  Collision Avoidance for Unmanned Aircraft using Markov Decision Processes , 2010 .

[14]  Nir Piterman From Nondeterministic Büchi and Streett Automata to Deterministic Parity Automata , 2007, Log. Methods Comput. Sci..

[15]  Calin Belta,et al.  Optimal Control of Markov Decision Processes With Linear Temporal Logic Constraints , 2014, IEEE Transactions on Automatic Control.

[16]  Krishnendu Chatterjee,et al.  Quantitative stochastic parity games , 2004, SODA '04.