论文信息 - The Logical Options Framework

The Logical Options Framework

Learning composable policies for environments with complex rules and tasks is a challenging problem. We introduce a hierarchical reinforcement learning framework called the Logical Options Framework (LOF) that learns policies that are satisfying, optimal, and composable. LOF effciently learns policies that satisfy tasks by representing the task as an automaton and integrating it into learning and planning. We provide and prove conditions under which LOF will learn satisfying, optimal policies. And lastly, we show how LOF’s learned policies can be composed to satisfy unseen tasks with only 10-50 retraining steps on our benchmarks. We evaluate LOF on four tasks in discrete and continuous domains, including a 3D pick-and-place environment.

[1] Sheila A. McIlraith,et al. Learning Reward Machines for Partially Observable Reinforcement Learning , 2019, NeurIPS.

[2] Mohan Sridharan,et al. A Survey of Knowledge-based Sequential Decision Making under Uncertainty , 2020, ArXiv.

[3] Ali Kansou,et al. Converting A Subset of LTL Formula to Buchi Automata , 2019, International Journal of Software Engineering & Applications.

[4] Sergey Levine,et al. Learning Actionable Representations with Goal-Conditioned Policies , 2018, ICLR.

[5] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[6] Rajeev Alur,et al. A Composable Specification Language for Reinforcement Learning Tasks , 2020, NeurIPS.

[7] Andre Cohen,et al. An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[8] Gregory D. Hager,et al. Combining neural networks and tree search for task and motion planning in challenging environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9] Borja G. Leon,et al. Systematic Generalisation through Task Temporal Logic and Deep Reinforcement Learning , 2020, 2006.08767.

[10] Alexandre Duret-Lutz,et al. Spot 2 . 0 — a framework for LTL and ω-automata manipulation , 2016 .

[11] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[12] Andrei Barbu,et al. Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13] Bowen Alpern,et al. Recognizing safety and liveness , 2005, Distributed Computing.

[14] Shen Li,et al. Planning With Uncertain Specifications (PUnS) , 2019, IEEE Robotics and Automation Letters.

[15] Ali Farhadi,et al. What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning , 2019, ArXiv.

[16] Andrew J. Davison,et al. PyRep: Bringing V-REP to Deep Robot Learning , 2019, ArXiv.

[17] Marie desJardins,et al. The Expected-Length Model of Options , 2019, IJCAI.

[18] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[19] Giuseppe De Giacomo,et al. Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf Restraining Specifications , 2018, ICAPS.

[20] Daniel Kroening,et al. Modular Deep Reinforcement Learning with Temporal Logic Specifications , 2019, ArXiv.

[21] Calin Belta,et al. A formal methods approach to interpretable reinforcement learning for robotic planning , 2019, Science Robotics.

[22] Christel Baier,et al. Principles of model checking , 2008 .

[23] Stefanie Tellex,et al. Planning with Abstract Markov Decision Processes , 2017, ICAPS.

[24] Alberto Camacho,et al. LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning , 2019, IJCAI.

[25] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[26] Sheila A. McIlraith,et al. Symbolic Plans as High-Level Instructions for Reinforcement Learning , 2020, ICAPS.

[27] Sergey Levine,et al. Search on the Replay Buffer: Bridging Planning and Reinforcement Learning , 2019, NeurIPS.

[28] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[29] Stefanie Tellex,et al. Planning with State Abstractions for Non-Markovian Task Specifications , 2019, Robotics: Science and Systems.

[30] Lydia Tapia,et al. PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31] Cristian-Ioan Vasile,et al. Deep Bayesian Nonparametric Learning of Rules and Plans from Demonstrations with a Learned Automaton Prior , 2020, AAAI.

[32] Fangkai Yang,et al. SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning , 2018, AAAI.

[33] Sheila A. McIlraith,et al. Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning , 2018, ICML.

[34] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[35] Kiran Vodrahalli,et al. Learning to Plan with Logical Automata , 2019, Robotics: Science and Systems.

[36] Lydia E. Kavraki,et al. Sampling-based motion planning with temporal goals , 2010, 2010 IEEE International Conference on Robotics and Automation.

[37] Calin Belta,et al. Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38] Radu Calinescu,et al. Assured Reinforcement Learning with Formally Verified Abstract Policies , 2017, ICAART.