Automata Guided Hierarchical Reinforcement Learning for Zero-shot Skill Composition

An obstacle that prevents the wide adoption of (deep) reinforcement learning (RL) in control systems is its need for a large number of interactions with the environment in order to master a skill. The learned skill usually generalizes poorly across domains and re-training is often necessary when presented with a new task. We present a framework that combines techniques in \textit{formal methods} with \textit{hierarchical reinforcement learning} (HRL). The set of techniques we provide allows for the convenient specification of tasks with logical expressions, learns hierarchical policies (meta-controller and low-level controllers) with well-defined intrinsic rewards using any RL methods and is able to construct new skills from existing ones without additional learning. We evaluate the proposed methods in a simple grid world simulation as well as simulation on a Baxter robot.

[1]  Yuval Tassa,et al.  Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[2]  Emanuel Todorov,et al.  Compositionality of optimal control laws , 2009, NIPS.

[3]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[4]  Stuart J. Russell,et al.  Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.

[5]  Orna Kupferman,et al.  Model Checking of Safety Properties , 1999, Formal Methods Syst. Des..

[6]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[8]  Calin Belta,et al.  Robust temporal logic model predictive control , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[9]  Sergey Levine,et al.  End-to-End Learning of Semantic Grasping , 2017, CoRL.

[10]  Sergey Levine,et al.  Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  George J. Pappas,et al.  Hybrid Controllers for Path Planning: A Temporal Logic Approach , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[12]  Yushan Chen,et al.  Multi-agent persistent monitoring in stochastic environments with temporal logic constraints , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[13]  Calin Belta,et al.  Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[15]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[16]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[17]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[18]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[20]  Christel Baier,et al.  Principles of model checking , 2008 .

[22]  David Isele,et al.  Navigating Intersections with Autonomous Vehicles using Deep Reinforcement Learning , 2017 .

[23]  David Silver,et al.  Learning values across many orders of magnitude , 2016, NIPS.

[24]  Frédo Durand,et al.  Linear Bellman combination for control of character animation , 2009, ACM Trans. Graph..

[25]  Calin Belta,et al.  Q-Learning for robust satisfaction of signal temporal logic specifications , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[26]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[27]  George J. Pappas,et al.  Translating Temporal Logic to Controller Specifications , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[28]  Calin Belta,et al.  Distributed information gathering policies under temporal logic constraints , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[29]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[30]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[31]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[32]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[33]  Calin Belta,et al.  Formal Methods for Discrete-Time Dynamical Systems , 2017 .