论文信息 - A Composable Specification Language for Reinforcement Learning Tasks

A Composable Specification Language for Reinforcement Learning Tasks

Reinforcement learning is a promising approach for learning control policies for robot tasks. However, specifying complex tasks (e.g., with multiple objectives and safety constraints) can be challenging, since the user must design a reward function that encodes the entire task. Furthermore, the user often needs to manually shape the reward to ensure convergence of the learning algorithm. We propose a language for specifying complex control tasks, along with an algorithm that compiles specifications in our language into a reward function and automatically performs reward shaping. We implement our approach in a tool called SPECTRL, and show that it outperforms several state-of-the-art baselines.

[1] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[2] Garvit Juniwal,et al. Robust online monitoring of signal temporal logic , 2017, Formal Methods Syst. Des..

[3] Lydia E. Kavraki,et al. Reactive synthesis for finite tasks under resource constraints , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[6] Krishnendu Chatterjee,et al. Graph Games and Reactive Synthesis , 2018, Handbook of Model Checking.

[7] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[8] Dejan Nickovic,et al. Monitoring properties of analog and mixed-signal circuits , 2012, International Journal on Software Tools for Technology Transfer.

[9] Calin Belta,et al. Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[11] Amir Pnueli,et al. The temporal logic of programs , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[12] Sheila A. McIlraith,et al. Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning , 2018, ICML.

[13] Benjamin Recht,et al. Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.

[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[16] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[17] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[18] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[19] Hadas Kress-Gazit,et al. Temporal-Logic-Based Reactive Mission and Motion Planning , 2009, IEEE Transactions on Robotics.

[20] Ufuk Topcu,et al. Constrained Cross-Entropy Method for Safe Reinforcement Learning , 2020, IEEE Transactions on Automatic Control.

[21] Garvit Juniwal,et al. Robust online monitoring of signal temporal logic , 2015, Formal Methods in System Design.

[22] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[23] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[24] Pierre Wolper,et al. Reasoning About Infinite Computations , 1994, Inf. Comput..

[25] Russ Tedrake,et al. Efficient Bipedal Robots Based on Passive-Dynamic Walkers , 2005, Science.

[26] Calin Belta,et al. Traffic Network Control From Temporal Logic Specifications , 2014, IEEE Transactions on Control of Network Systems.

[27] Ufuk Topcu,et al. Receding Horizon Temporal Logic Planning , 2012, IEEE Transactions on Automatic Control.

[28] George J. Pappas,et al. Robustness of temporal logic specifications for continuous-time signals , 2009, Theor. Comput. Sci..