Optimal control of MDPs with temporal logic constraints

In this paper, we focus on formal synthesis of control policies for finite Markov decision processes with non-negative real-valued costs. We develop an algorithm to automatically generate a policy that guarantees the satisfaction of a correctness specification expressed as a formula of Linear Temporal Logic, while at the same time minimizing the expected average cost between two consecutive satisfactions of a desired property. The existing solutions to this problem are sub-optimal. By leveraging ideas from automata-based model checking and game theory, we provide an optimal solution. We demonstrate the approach on an illustrative example.

[1]  Thomas Wilke,et al.  Automata logics, and infinite games: a guide to current research , 2002 .

[2]  Thomas Wilke,et al.  Automata Logics, and Infinite Games , 2002, Lecture Notes in Computer Science.

[3]  Zohar Manna,et al.  Formal verification of probabilistic systems , 1997 .

[4]  Calin Belta,et al.  Optimal receding horizon control for finite deterministic systems with temporal logic constraints , 2013, 2013 American Control Conference.

[5]  Leslie Pack Kaelbling,et al.  Collision Avoidance for Unmanned Aircraft using Markov Decision Processes , 2010 .

[6]  Calin Belta,et al.  MDP optimal control under temporal logic constraints , 2011, IEEE Conference on Decision and Control and European Control Conference.

[7]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[8]  Calin Belta,et al.  LTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees , 2011, ArXiv.

[10]  Christel Baier,et al.  Principles of model checking , 2008 .

[11]  Calin Belta,et al.  Temporal Logic Motion Planning and Control With Probabilistic Satisfaction Guarantees , 2012, IEEE Transactions on Robotics.

[12]  Yushan Chen,et al.  LTL robot motion control based on automata learning of environmental dynamics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[13]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Vol. II , 1976 .

[14]  Krishnendu Chatterjee,et al.  Games and Markov Decision Processes with Mean-Payoff Parity and Energy Parity Objectives , 2011, MEMICS.

[15]  Krzysztof R. Apt,et al.  Lectures in Game Theory for Computer Scientists , 2011 .

[16]  Christel Baier,et al.  PROBMELA: a modeling language for communicating probabilistic processes , 2004, Proceedings. Second ACM and IEEE International Conference on Formal Methods and Models for Co-Design, 2004. MEMOCODE '04..

[17]  Krishnendu Chatterjee,et al.  Energy and Mean-Payoff Parity Markov Decision Processes , 2011, MFCS.

[18]  Thierry Siméon,et al.  The Stochastic Motion Roadmap: A Sampling Framework for Planning with Markov Motion Uncertainty , 2007, Robotics: Science and Systems.

[19]  Ivana Cerná,et al.  Attraction-based receding horizon path planning with temporal logic constraints , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[20]  C. Baier,et al.  Experiments with Deterministic ω-Automata for Formulas of Linear Temporal Logic , 2005 .

[21]  Mihalis Yannakakis,et al.  The complexity of probabilistic verification , 1995, JACM.

[22]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[23]  Berndt Farwer,et al.  ω-automata , 2002 .

[24]  Krishnendu Chatterjee,et al.  Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification , 2011, SODA '11.