Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction

This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of complex tasks, which are expressed by linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process (PL-MDP) with unknown transition probabilities and probabilistic labeling functions. The LTL task specification is converted to a limit deterministic generalized Büchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets of LDGBA at each round of the repeated visiting pattern, to overcome the difficulties of directly applying conventional LDGBA. With appropriate dependent reward and discount functions, rigorous analysis shows that any method, which optimizes the expected discount return of the RL-based approach, is guaranteed to find the optimal policy to maximize the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.

[1]  Kiyosi Itô,et al.  Essentials of Stochastic Processes , 2006 .

[2]  Jan Kretínský,et al.  Owl: A Library for ω-Words, Automata, and LTL , 2018, ATVA.

[3]  Michael M. Zavlanos,et al.  Reduced variance deep reinforcement learning with temporal logic specifications , 2019, ICCPS.

[4]  Jing Wang,et al.  Temporal logic motion control using actor-critic methods , 2012, ICRA.

[5]  Daniel Kroening,et al.  Certified Reinforcement Learning with Logic Guidance , 2019, Artif. Intell..

[6]  Zhen Kan,et al.  Learning-Based Probabilistic LTL Motion Planning With Environment and Motion Uncertainties , 2021, IEEE Transactions on Automatic Control.

[7]  Calin Belta,et al.  A formal methods approach to interpretable reinforcement learning for robotic planning , 2019, Science Robotics.

[8]  Jun Liu,et al.  Continuous Motion Planning with Temporal Logic Specifications using Deep Neural Networks , 2020, ArXiv.

[9]  Calin Belta,et al.  Optimal Control of Markov Decision Processes With Linear Temporal Logic Constraints , 2014, IEEE Transactions on Automatic Control.

[10]  Sven Schewe,et al.  Omega-Regular Objectives in Model-Free Reinforcement Learning , 2018, TACAS.

[11]  Zhijun Li,et al.  Receding Horizon Control Based Motion Planning with Partially Infeasible LTL Constrains , 2021, 2021 American Control Conference (ACC).

[12]  Toshimitsu Ushio,et al.  Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata , 2020, IEEE Control Systems Letters.

[13]  Sheila A. McIlraith,et al.  Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning , 2018, ICML.

[14]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[15]  Yu Wang,et al.  Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Alberto Camacho,et al.  LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning , 2019, IJCAI.

[17]  Christel Baier,et al.  Principles of model checking , 2008 .

[18]  Michael M. Zavlanos,et al.  STyLuS*: A Temporal Logic Optimal Control Synthesis Algorithm for Large-Scale Multi-Robot Systems , 2018, Int. J. Robotics Res..

[19]  Tom Melham,et al.  DeepSynth: Program Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning , 2019, ArXiv.

[20]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[21]  Ufuk Topcu,et al.  Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints , 2014, Robotics: Science and Systems.

[22]  Daniel Kroening,et al.  Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[23]  S. Shankar Sastry,et al.  A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications , 2014, 53rd IEEE Conference on Decision and Control.

[24]  Jan Kretínský,et al.  Limit-Deterministic Büchi Automata for Linear Temporal Logic , 2016, CAV.

[25]  Nick Hawes,et al.  Probabilistic planning with formal performance guarantees for mobile service robots , 2019, Int. J. Robotics Res..

[26]  Zhen Kan,et al.  Optimal Probabilistic Motion Planning with Partially Infeasible LTL Constraints , 2020, ArXiv.

[27]  Michael M. Zavlanos,et al.  Probabilistic Motion Planning Under Temporal Tasks and Soft Constraints , 2017, IEEE Transactions on Automatic Control.