Learning to Coordinate Manipulation Skills via Skill Behavior Diversification

When mastering a complex manipulation task, humans often decompose the task into sub-skills of their body parts, practice the sub-skills independently, and then execute the sub-skills together. Similarly, a robot with multiple end-effectors can perform a complex task by coordinating sub-skills of each end-effector. To realize temporal and behavioral coordination of skills, we propose a hierarchical framework that first individually trains sub-skills of each end-effector with skill behavior diversification, and learns to coordinate end-effectors using diverse behaviors of the skills. We demonstrate that our proposed framework is able to efficiently learn sub-skills with diverse behaviors and coordinate them to solve challenging collaborative control tasks such as picking up a long bar, placing a block inside a container while pushing the container with two robot arms, and pushing a box with two ant agents.

[1]  Vikash Kumar,et al.  Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real , 2019, CoRL.

[2]  Sergey Levine,et al.  Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives , 2019, ICLR.

[3]  Michael C. Yip,et al.  Composing Task-Agnostic Policies with Deep Reinforcement Learning , 2019, International Conference on Learning Representations.

[4]  Byron Boots,et al.  Composing Ensembles of Policies with Deep Reinforcement Learning , 2019, ArXiv.

[5]  Sergey Levine,et al.  MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies , 2019, NeurIPS.

[6]  Joseph J. Lim,et al.  Composing Complex Skills by Learning Transition Policies , 2018, ICLR.

[7]  Nicolas Heess,et al.  Hierarchical visuomotor control of humanoids , 2018, ICLR.

[8]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[9]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[10]  Kate Saenko,et al.  Hierarchical Reinforcement Learning with Hindsight , 2018, ArXiv.

[11]  Zongqing Lu,et al.  Learning Attentional Communication for Multi-Agent Cooperation , 2018, NeurIPS.

[12]  Sergey Levine,et al.  Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[14]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[15]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[16]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[17]  Kate Saenko,et al.  Hierarchical Actor-Critic , 2017, ArXiv.

[18]  Sergey Levine,et al.  Divide-and-Conquer Reinforcement Learning , 2017, ICLR.

[19]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[20]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[21]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[22]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[23]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[24]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[25]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[26]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[27]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[28]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[29]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[30]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[31]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[32]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[33]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[35]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[36]  Jürgen Schmidhuber Towards compositional learning with dynamic neural networks , 1990, Forschungsberichte, TU Munich.

[37]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .