Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

This letter proposes a novel reinforcement learning method for the synthesis of a control policy satisfying a control specification described by a linear temporal logic formula. We assume that the controlled system is modeled by a Markov decision process (MDP). We convert the specification to a limit-deterministic generalized Büchi automaton (LDGBA) with several accepting sets that accepts all infinite sequences satisfying the formula. The LDGBA is augmented so that it explicitly records the previous visits to accepting sets. We take a product of the augmented LDGBA and the MDP, based on which we define a reward function. The agent gets rewards whenever state transitions are in an accepting set that has not been visited for a certain number of steps. Consequently, sparsity of rewards is relaxed and optimal circulations among the accepting sets are learned. We show that the proposed method can learn an optimal policy when the discount factor is sufficiently close to one.

[1]  Daniel Kroening,et al.  Logically-Constrained Reinforcement Learning , 2018, 1801.08099.

[2]  Jan Kretínský,et al.  Owl: A Library for ω-Words, Automata, and LTL , 2018, ATVA.

[3]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4]  Calin Belta,et al.  A formal methods approach to interpretable reinforcement learning for robotic planning , 2019, Science Robotics.

[5]  Christel Baier,et al.  Principles of model checking , 2008 .

[6]  Ufuk Topcu,et al.  Receding Horizon Temporal Logic Planning , 2012, IEEE Transactions on Automatic Control.

[7]  Jan Kretínský,et al.  Limit-Deterministic Büchi Automata for Linear Temporal Logic , 2016, CAV.

[8]  Daniel Kroening,et al.  Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[9]  S. Shankar Sastry,et al.  A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications , 2014, 53rd IEEE Conference on Decision and Control.

[10]  Yu Wang,et al.  Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Calin Belta,et al.  Formal Methods for Discrete-Time Dynamical Systems , 2017 .

[12]  L. Breuer Introduction to Stochastic Processes , 2022, Statistical Methods for Climate Scientists.

[13]  John E. R. Staddon,et al.  The dynamics of behavior: Review of Sutton and Barto: Reinforcement Learning : An Introduction (2 nd ed.) , 2020 .

[14]  Calin Belta,et al.  A Fully Automated Framework for Control of Linear Systems from Temporal Logic Specifications , 2008, IEEE Transactions on Automatic Control.

[15]  Erhan Çinlar,et al.  Introduction to stochastic processes , 1974 .

[16]  Toshimitsu Ushio,et al.  Decentralized Supervision and Coordination of Concurrent Discrete Event Systems under LTL Constraints , 2018 .

[17]  Toshimitsu Ushio,et al.  Learning an Optimal Control Policy for a Markov Decision Process Under Linear Temporal Logic Specifications , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[18]  Sven Schewe,et al.  Omega-Regular Objectives in Model-Free Reinforcement Learning , 2018, TACAS.

[19]  Ufuk Topcu,et al.  Robust control of uncertain Markov Decision Processes with temporal logic specifications , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[20]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[21]  Michael M. Zavlanos,et al.  Reduced variance deep reinforcement learning with temporal logic specifications , 2019, ICCPS.