论文信息 - LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning - 字舞流文

LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning

In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatically translated into RMs, easing the burden of complex reward function specification. We then show how the exposed structure of the reward function can be exploited by tailored q-learning algorithms and automated reward shaping techniques in order to improve the sample efficiency of reinforcement learning methods. Experiments show that these RM-tailored techniques significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise cannot reasonably be solved by existing approaches.

Alberto Camacho | Sheila A. McIlraith | Richard Anthony Valenzano | Toryn Q. Klassen | Rodrigo Toro Icarte | R. Valenzano | Alberto Camacho

[1] Scott Sanner,et al. Non-Markovian Rewards Expressed in LTL: Guiding Search Via Reward Shaping , 2021, SOCS.

[2] Craig Boutilier,et al. Non-delusional Q-learning and value-iteration , 2018, NeurIPS.

[3] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[4] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[7] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[8] George H. Mealy,et al. A method for synthesizing sequential circuits , 1955 .

[9] Sharad Malik,et al. Formal Methods in System Design: Preface , 2009 .

[10] Jorge A. Baier,et al. Beyond Classical Planning: Procedural Control Knowledge and Preferences in State-of-the-Art Planners , 2008, AAAI.

[11] Alexandre Duret-Lutz,et al. Spot 2 . 0 — a framework for LTL and ω-automata manipulation , 2016 .

[12] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[13] John K. Slaney,et al. Decision-Theoretic Planning with non-Markovian Rewards , 2011, J. Artif. Intell. Res..

[14] Daniel Gooch,et al. Communications of the ACM , 2011, XRDS.

[15] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[16] Sheila A. McIlraith,et al. Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning , 2018, ICML.

[17] Jorge A. Baier,et al. Preferred Explanations: Theory and Generation via Planning , 2011, AAAI.

[18] Thorsten Altenkirch,et al. Languages and Computation (G52LAC) Lecture notes Spring 2017 , 2017 .

[19] E. Allen Emerson,et al. Temporal and Modal Logic , 1991, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[20] Ronen I. Brafman,et al. LTLf/LDLf Non-Markovian Rewards , 2018, AAAI.

[21] Craig Boutilier,et al. Structured Solution Methods for Non-Markovian Decision Processes , 1997, AAAI/IAAI.

[22] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[23] Giuseppe De Giacomo,et al. Linear Temporal Logic and Linear Dynamic Logic on Finite Traces , 2013, IJCAI.

[24] Bell Telephone,et al. Regular Expression Search Algorithm , 1968 .

[25] M. Pollack. Journal of Artificial Intelligence Research: Preface , 2001 .

[26] Sheila A. McIlraith,et al. Teaching Multiple Tasks to an RL Agent using LTL , 2018, AAMAS.

[27] Jorge A. Baier,et al. Planning with Temporally Extended Goals Using Heuristic Search , 2006, ICAPS.