论文信息 - Optimal control of MDPs with temporal logic constraints

Optimal control of MDPs with temporal logic constraints

In this paper, we focus on formal synthesis of control policies for finite Markov decision processes with non-negative real-valued costs. We develop an algorithm to automatically generate a policy that guarantees the satisfaction of a correctness specification expressed as a formula of Linear Temporal Logic, while at the same time minimizing the expected average cost between two consecutive satisfactions of a desired property. The existing solutions to this problem are sub-optimal. By leveraging ideas from automata-based model checking and game theory, we provide an optimal solution. We demonstrate the approach on an illustrative example.

[1] Thomas Wilke,et al. Automata logics, and infinite games: a guide to current research , 2002 .

[2] Thomas Wilke,et al. Automata Logics, and Infinite Games , 2002, Lecture Notes in Computer Science.

[3] Zohar Manna,et al. Formal verification of probabilistic systems , 1997 .

[4] Calin Belta,et al. Optimal receding horizon control for finite deterministic systems with temporal logic constraints , 2013, 2013 American Control Conference.

[5] Leslie Pack Kaelbling,et al. Collision Avoidance for Unmanned Aircraft using Markov Decision Processes , 2010 .

[6] Calin Belta,et al. MDP optimal control under temporal logic constraints , 2011, IEEE Conference on Decision and Control and European Control Conference.

[7] J. Filar,et al. Competitive Markov Decision Processes , 1996 .

[8] Calin Belta,et al. LTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees , 2011, ArXiv.

[10] Christel Baier,et al. Principles of model checking , 2008 .

[11] Calin Belta,et al. Temporal Logic Motion Planning and Control With Probabilistic Satisfaction Guarantees , 2012, IEEE Transactions on Robotics.

[12] Yushan Chen,et al. LTL robot motion control based on automata learning of environmental dynamics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[13] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .

[14] Krishnendu Chatterjee,et al. Games and Markov Decision Processes with Mean-Payoff Parity and Energy Parity Objectives , 2011, MEMICS.

[15] Krzysztof R. Apt,et al. Lectures in Game Theory for Computer Scientists , 2011 .

[16] Christel Baier,et al. PROBMELA: a modeling language for communicating probabilistic processes , 2004, Proceedings. Second ACM and IEEE International Conference on Formal Methods and Models for Co-Design, 2004. MEMOCODE '04..

[17] Krishnendu Chatterjee,et al. Energy and Mean-Payoff Parity Markov Decision Processes , 2011, MFCS.

[18] Thierry Siméon,et al. The Stochastic Motion Roadmap: A Sampling Framework for Planning with Markov Motion Uncertainty , 2007, Robotics: Science and Systems.

[19] Ivana Cerná,et al. Attraction-based receding horizon path planning with temporal logic constraints , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[20] C. Baier,et al. Experiments with Deterministic ω-Automata for Formulas of Linear Temporal Logic , 2005 .

[21] Mihalis Yannakakis,et al. The complexity of probabilistic verification , 1995, JACM.

[22] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[23] Berndt Farwer,et al. ω-automata , 2002 .

[24] Krishnendu Chatterjee,et al. Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification , 2011, SODA '11.