论文信息 - Automata Guided Hierarchical Reinforcement Learning for Zero-shot Skill Composition

Automata Guided Hierarchical Reinforcement Learning for Zero-shot Skill Composition

An obstacle that prevents the wide adoption of (deep) reinforcement learning (RL) in control systems is its need for a large number of interactions with the environment in order to master a skill. The learned skill usually generalizes poorly across domains and re-training is often necessary when presented with a new task. We present a framework that combines techniques in \textit{formal methods} with \textit{hierarchical reinforcement learning} (HRL). The set of techniques we provide allows for the convenient specification of tasks with logical expressions, learns hierarchical policies (meta-controller and low-level controllers) with well-defined intrinsic rewards using any RL methods and is able to construct new skills from existing ones without additional learning. We evaluate the proposed methods in a simple grid world simulation as well as simulation on a Baxter robot.

Calin Belta | Xiao Li | Yao Ma

[1] Yuval Tassa,et al. Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[2] Emanuel Todorov,et al. Compositionality of optimal control laws , 2009, NIPS.

[3] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[4] Stuart J. Russell,et al. Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.

[5] Orna Kupferman,et al. Model Checking of Safety Properties , 1999, Formal Methods Syst. Des..

[6] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[8] Calin Belta,et al. Robust temporal logic model predictive control , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[9] Sergey Levine,et al. End-to-End Learning of Semantic Grasping , 2017, CoRL.

[10] Sergey Levine,et al. Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11] George J. Pappas,et al. Hybrid Controllers for Path Planning: A Temporal Logic Approach , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[12] Yushan Chen,et al. Multi-agent persistent monitoring in stochastic environments with temporal logic constraints , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[13] Calin Belta,et al. Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[15] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[16] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[17] Ali Farhadi,et al. AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[18] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[20] Christel Baier,et al. Principles of model checking , 2008 .

[22] David Isele,et al. Navigating Intersections with Autonomous Vehicles using Deep Reinforcement Learning , 2017 .

[23] David Silver,et al. Learning values across many orders of magnitude , 2016, NIPS.

[24] Frédo Durand,et al. Linear Bellman combination for control of character animation , 2009, ACM Trans. Graph..

[25] Calin Belta,et al. Q-Learning for robust satisfaction of signal temporal logic specifications , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[26] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[27] George J. Pappas,et al. Translating Temporal Logic to Controller Specifications , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[28] Calin Belta,et al. Distributed information gathering policies under temporal logic constraints , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[29] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[30] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[31] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[32] Honglak Lee,et al. Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[33] Calin Belta,et al. Formal Methods for Discrete-Time Dynamical Systems , 2017 .