Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization

Skill chaining is a promising approach for synthesizing complex behaviors by sequentially combining previously learned skills. Yet, a naive composition of skills fails when a policy encounters a starting state never seen during its training. For successful skill chaining, prior approaches attempt to widen the policy’s starting state distribution. However, these approaches require larger state distributions to be covered as more policies are sequenced, and thus are limited to short skill sequences. In this paper, we propose to chain multiple policies without excessively large initial state distributions by regularizing the terminal state distributions in an adversarial learning framework. We evaluate our approach on two complex long-horizon manipulation tasks of furniture assembly. Our results have shown that our method establishes the first model-free reinforcement learning algorithm to solve these tasks; whereas prior skill chaining approaches fail. The code and videos are available at https://clvrai.com/skill-chaining.

[1]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[2]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[3]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[4]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[5]  Matthieu Geist,et al.  What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study , 2021, ICLR.

[6]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[7]  Jung-Su Ha,et al.  Learning Geometric Reasoning and Control for Long-Horizon Tasks from Visual Input , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Sergey Levine,et al.  Divide-and-Conquer Reinforcement Learning , 2017, ICLR.

[9]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[10]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[11]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[12]  Hammad Mazhar,et al.  Transferable Task Execution from Pixels through Deep Planning Domain Learning , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[14]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[15]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[16]  George Konidaris,et al.  Option Discovery using Deep Skill Chaining , 2020, ICLR.

[17]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[18]  Aravind Rajeswaran,et al.  Learning Deep Visuomotor Policies for Dexterous Hand Manipulation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Sergey Levine,et al.  MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies , 2019, NeurIPS.

[20]  P. Alam ‘T’ , 2021, Composites Engineering: An A–Z Guide.

[21]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[22]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[23]  Joseph J. Lim,et al.  Accelerating Reinforcement Learning with Learned Skill Priors , 2020, CoRL.

[24]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  P. Alam ‘N’ , 2021, Composites Engineering: An A–Z Guide.

[26]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[27]  Nicolas Heess,et al.  Hierarchical visuomotor control of humanoids , 2018, ICLR.

[28]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[29]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[30]  Xian Zhou,et al.  Can robots assemble an IKEA chair? , 2018, Science Robotics.

[31]  Stefan Schaal,et al.  Learning from Demonstration , 1996, NIPS.

[32]  Misha Denil,et al.  Task-Relevant Adversarial Imitation Learning , 2019, CoRL.

[33]  Quang-Cuong Pham,et al.  A framework for fine robotic assembly , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[35]  Christoph H. Lampert,et al.  Movement templates for learning of hitting and batting , 2010, 2010 IEEE International Conference on Robotics and Automation.

[36]  Joseph J. Lim,et al.  Composing Complex Skills by Learning Transition Policies , 2018, ICLR.

[37]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[38]  Joseph J. Lim,et al.  Learning to Coordinate Manipulation Skills via Skill Behavior Diversification , 2020, ICLR.

[39]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[40]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[41]  Joseph J. Lim,et al.  IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks , 2019, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[43]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[44]  Pushmeet Kohli,et al.  CompILE: Compositional Imitation Learning and Execution , 2018, ICML.

[45]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[46]  DarrellTrevor,et al.  End-to-end training of deep visuomotor policies , 2016 .

[47]  Ross A. Knepper,et al.  IkeaBot: An autonomous multi-robot coordinated furniture assembly system , 2013, 2013 IEEE International Conference on Robotics and Automation.

[48]  Sergey Levine,et al.  AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control , 2021, ACM Trans. Graph..

[49]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[50]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[51]  Kate Saenko,et al.  Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.

[52]  Scott Niekum,et al.  Incremental Semantically Grounded Learning from Demonstration , 2013, Robotics: Science and Systems.

[53]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[54]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[55]  Joshua B. Tenenbaum,et al.  Learning Task Decomposition with Ordered Memory Policy Network , 2021, ICLR.