A Composable Specification Language for Reinforcement Learning Tasks

Reinforcement learning is a promising approach for learning control policies for robot tasks. However, specifying complex tasks (e.g., with multiple objectives and safety constraints) can be challenging, since the user must design a reward function that encodes the entire task. Furthermore, the user often needs to manually shape the reward to ensure convergence of the learning algorithm. We propose a language for specifying complex control tasks, along with an algorithm that compiles specifications in our language into a reward function and automatically performs reward shaping. We implement our approach in a tool called SPECTRL, and show that it outperforms several state-of-the-art baselines.

[1]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[2]  Garvit Juniwal,et al.  Robust online monitoring of signal temporal logic , 2017, Formal Methods Syst. Des..

[3]  Lydia E. Kavraki,et al.  Reactive synthesis for finite tasks under resource constraints , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Krishnendu Chatterjee,et al.  Graph Games and Reactive Synthesis , 2018, Handbook of Model Checking.

[7]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[8]  Dejan Nickovic,et al.  Monitoring properties of analog and mixed-signal circuits , 2012, International Journal on Software Tools for Technology Transfer.

[9]  Calin Belta,et al.  Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[11]  Amir Pnueli,et al.  The temporal logic of programs , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[12]  Sheila A. McIlraith,et al.  Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning , 2018, ICML.

[13]  Benjamin Recht,et al.  Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[16]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[17]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[18]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[19]  Hadas Kress-Gazit,et al.  Temporal-Logic-Based Reactive Mission and Motion Planning , 2009, IEEE Transactions on Robotics.

[20]  Ufuk Topcu,et al.  Constrained Cross-Entropy Method for Safe Reinforcement Learning , 2020, IEEE Transactions on Automatic Control.

[21]  Garvit Juniwal,et al.  Robust online monitoring of signal temporal logic , 2015, Formal Methods in System Design.

[22]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[23]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[24]  Pierre Wolper,et al.  Reasoning About Infinite Computations , 1994, Inf. Comput..

[25]  Russ Tedrake,et al.  Efficient Bipedal Robots Based on Passive-Dynamic Walkers , 2005, Science.

[26]  Calin Belta,et al.  Traffic Network Control From Temporal Logic Specifications , 2014, IEEE Transactions on Control of Network Systems.

[27]  Ufuk Topcu,et al.  Receding Horizon Temporal Logic Planning , 2012, IEEE Transactions on Automatic Control.

[28]  George J. Pappas,et al.  Robustness of temporal logic specifications for continuous-time signals , 2009, Theor. Comput. Sci..