论文信息 - Deep Reinforcement Learning with Temporal Logics

Deep Reinforcement Learning with Temporal Logics

The combination of data-driven learning methods with formal reasoning has seen a surge of interest, as either area has the potential to bolstering the other. For instance, formal methods promise to expand the use of state-of-the-art learning approaches in the direction of certification and sample efficiency. In this work, we propose a deep Reinforcement Learning (RL) method for policy synthesis in continuous-state/action unknown environments, under requirements expressed in Linear Temporal Logic (LTL). We show that this combination lifts the applicability of deep RL to complex temporal and memory-dependent policy synthesis goals. We express an LTL specification as a Limit Deterministic Buchi Automaton (LDBA) and synchronise it on-the-fly with the agent/environment. The LDBA in practice monitors the environment, acting as a modular reward machine for the agent: accordingly, a modular Deep Deterministic Policy Gradient (DDPG) architecture is proposed to generate a low-level control policy that maximises the probability of the given LTL formula. We evaluate our framework in a cart-pole example and in a Mars rover experiment, where we achieve near-perfect success rates, while baselines based on standard RL are shown to fail in practice.

[1] Fabio Somenzi,et al. Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning , 2020, 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS).

[2] Daniel Kroening,et al. Cautious Reinforcement Learning with Logical Constraints , 2020, AAMAS.

[3] Calin Belta,et al. A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks , 2018, 2018 Annual American Control Conference (ACC).

[4] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[5] Daniel Kroening,et al. Modular Deep Reinforcement Learning with Temporal Logic Specifications , 2019, ArXiv.

[6] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[7] Giuseppe De Giacomo,et al. Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf Restraining Specifications , 2018, ICAPS.

[8] Sven Schewe,et al. Omega-Regular Objectives in Model-Free Reinforcement Learning , 2018, TACAS.

[9] Chih-Hong Cheng,et al. Formal consistency checking over specifications in natural languages , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[10] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[11] Xianping Guo,et al. Markov decision processes with state-dependent discount factors and unbounded rewards/costs , 2011, Oper. Res. Lett..

[12] T. J. McCoy,et al. Exploration of Victoria Crater by the Mars Rover Opportunity , 2009, Science.

[13] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[14] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[15] Ufuk Topcu,et al. Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints , 2014, Robotics: Science and Systems.

[16] Nathan Fulton,et al. Verifiably Safe Off-Model Reinforcement Learning , 2019, TACAS.

[17] S. Shankar Sastry,et al. A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications , 2014, 53rd IEEE Conference on Decision and Control.

[18] Sebastian Junges,et al. Safety-Constrained Reinforcement Learning for MDPs , 2015, TACAS.

[19] Naoto Yoshida,et al. Reinforcement learning with state-dependent discount factor , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[20] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[21] Nathan Fulton,et al. Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning , 2018, AAAI.

[22] Sheila A. McIlraith,et al. Teaching Multiple Tasks to an RL Agent using LTL , 2018, AAMAS.

[23] Toshimitsu Ushio,et al. Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata , 2020, IEEE Control Systems Letters.

[24] Amir Pnueli,et al. The temporal logic of programs , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[25] Daniel Kroening,et al. Logically-Constrained Neural Fitted Q-Iteration , 2018, AAMAS.

[26] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .

[27] Kate Saenko,et al. Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.

[28] Silviu Pitis,et al. Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach , 2019, AAAI.

[29] Giuseppe De Giacomo,et al. Imitation Learning over Heterogeneous Agents with Restraining Bolts , 2020, ICAPS.

[30] Daniel Kroening,et al. Logically-Constrained Reinforcement Learning , 2018, 1801.08099.

[31] Tom Melham,et al. DeepSynth: Program Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning , 2019, ArXiv.