论文信息 - Planning with State Abstractions for Non-Markovian Task Specifications

Planning with State Abstractions for Non-Markovian Task Specifications

Often times, we specify tasks for a robot using temporal language that can also span different levels of abstraction. The example command ``go to the kitchen before going to the second floor'' contains spatial abstraction, given that ``floor'' consists of individual rooms that can also be referred to in isolation ("kitchen", for example). There is also a temporal ordering of events, defined by the word "before". Previous works have used Linear Temporal Logic (LTL) to interpret temporal language (such as "before"), and Abstract Markov Decision Processes (AMDPs) to interpret hierarchical abstractions (such as "kitchen" and "second floor"), separately. To handle both types of commands at once, we introduce the Abstract Product Markov Decision Process (AP-MDP), a novel approach capable of representing non-Markovian reward functions at different levels of abstractions. The AP-MDP framework translates LTL into its corresponding automata, creates a product Markov Decision Process (MDP) of the LTL specification and the environment MDP, and decomposes the problem into subproblems to enable efficient planning with abstractions. AP-MDP performs faster than a non-hierarchical method of solving LTL problems in over 95% of tasks, and this number only increases as the size of the environment domain increases. We also present a neural sequence-to-sequence model trained to translate language commands into LTL expression, and a new corpus of non-Markovian language commands spanning different levels of abstraction. We test our framework with the collected language commands on a drone, demonstrating that our approach enables a robot to efficiently solve temporal commands at different levels of abstraction.

[1] Songhwai Oh,et al. Robust multi-layered sampling-based path planning for temporal logic-based missions , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[2] John C. Tang,et al. 3D Collaboration Method over HoloLens™ and Skype™ End Points , 2015, ImmersiveME@ACM Multimedia.

[3] Alexandre Duret-Lutz,et al. Spot 2 . 0 — a framework for LTL and ω-automata manipulation , 2016 .

[4] Erion Plaku,et al. Sampling-based tree search with discrete abstractions for motion planning with dynamics and temporal logic , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5] Noah D. Goodman,et al. Evaluating Compositionality in Sentence Embeddings , 2018, CogSci.

[6] Calin Belta,et al. MDP optimal control under temporal logic constraints , 2011, IEEE Conference on Decision and Control and European Control Conference.

[7] Stefanie Tellex,et al. Sequence-to-Sequence Language Grounding of Non-Markovian Task Specifications , 2018, Robotics: Science and Systems.

[8] Xuan Liu,et al. Compositional planning in Markov decision processes: Temporal abstraction meets generalized logic composition , 2018, 2019 American Control Conference (ACC).

[9] John Oberlin,et al. PiDrone: An Autonomous Educational Drone Using Raspberry Pi and Python , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[11] George Konidaris,et al. Constructing Abstraction Hierarchies Using a Skill-Symbol Loop , 2015, IJCAI.

[12] Ufuk Topcu,et al. Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints , 2014, Robotics: Science and Systems.

[13] Ufuk Topcu,et al. Robust control of uncertain Markov Decision Processes with temporal logic specifications , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[14] Claire J. Tomlin,et al. Cost-Aware Path Planning Under Co-Safe Temporal Logic Specifications , 2017, IEEE Robotics and Automation Letters.

[15] Smaranda Muresan,et al. Grounding English Commands to Reward Functions , 2015, Robotics: Science and Systems.

[16] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19] Philipp Koehn,et al. Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[20] Marco Baroni,et al. Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[21] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[22] Marco Baroni,et al. Still not systematic after all these years: On the compositional skills of sequence-to-sequence recurrent networks , 2017, ICLR 2018.

[23] Hadas Kress-Gazit,et al. LTLMoP: Experimenting with language, Temporal Logic and robot control , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24] Hadas Kress-Gazit,et al. Translating Structured English to Robot Controllers , 2008, Adv. Robotics.

[25] Stefanie Tellex,et al. Planning with Abstract Markov Decision Processes , 2017, ICAPS.

[26] Calin Belta,et al. Optimal Control of Markov Decision Processes With Linear Temporal Logic Constraints , 2014, IEEE Transactions on Automatic Control.

[27] Stefanie Tellex,et al. Accurately and Efficiently Interpreting Human-Robot Instructions of Varying Granularities , 2017, Robotics: Science and Systems.

[28] Hadas Kress-Gazit,et al. Provably correct reactive control from natural language , 2015, Auton. Robots.

[29] Matthias Scheutz,et al. Interpretable apprenticeship learning with temporal logic specifications , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[30] Leslie Pack Kaelbling,et al. From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning , 2018, J. Artif. Intell. Res..

[31] S. Shankar Sastry,et al. A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications , 2014, 53rd IEEE Conference on Decision and Control.

[32] Hadas Kress-Gazit,et al. A model for verifiable grounding and execution of complex natural language instructions , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[33] J. R. Büchi. On a Decision Method in Restricted Second Order Arithmetic , 1990 .