Multi-Task Learning with Sequence-Conditioned Transporter Networks

Enabling robots to solve multiple manipulation tasks has a wide range of industrial applications. While learning-based approaches enjoy flexibility and generalizability, scaling these approaches to solve such compositional tasks remains a challenge. In this work, we aim to solve multitask learning through the lens of sequence-conditioning and weighted sampling. First, we propose a new suite of benchmark specifically aimed at compositional tasks, MultiRavens, which allows defining custom task combinations through task modules that are inspired by industrial tasks and exemplify the difficulties in vision-based learning and planning methods. Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequenceconditioning and weighted sampling and can efficiently learn to solve multi-task long horizon problems. Our analysis suggests that not only the new framework significantly improves pickand-place performance on novel 10 multi-task benchmark problems, but also the multi-task learning with weighted sampling can vastly improve learning and agent performances on individual tasks.

[1]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[2]  Robert Platt,et al.  Pick and Place Without Geometric Object Models , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Szymon Rusinkiewicz,et al.  Spatial Action Maps for Mobile Manipulation , 2020, Robotics: Science and Systems.

[4]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[5]  Honglak Lee,et al.  Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies , 2020, ICLR.

[6]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[7]  Gregory Hager,et al.  “Good Robot!”: Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer , 2019, IEEE Robotics and Automation Letters.

[8]  Alberto Rodriguez,et al.  Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Sergey Levine,et al.  One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.

[10]  Ian Taylor,et al.  Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Risi Kondor,et al.  On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups , 2018, ICML.

[12]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[14]  Pieter Abbeel,et al.  Learning to Manipulate Deformable Objects without Demonstrations , 2019, Robotics: Science and Systems.

[15]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[16]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[17]  Torsten Kröger,et al.  Self-Supervised Learning for Precise Pick-and-Place Without Object Model , 2020, IEEE Robotics and Automation Letters.

[18]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[19]  Huazhe Xu,et al.  Solving Compositional Reinforcement Learning Problems via Task Reduction , 2021, ICLR.

[20]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[21]  Ken Goldberg,et al.  Learning ambidextrous robot grasping policies , 2019, Science Robotics.

[22]  Sudeep Dasari,et al.  Transformers for One-Shot Visual Imitation , 2020, CoRL.

[23]  Daniel Kappler,et al.  Action Image Representation: Learning Scalable Deep Grasping Policies with Zero Real World Data , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Anca D. Dragan,et al.  SHIV: Reducing supervisor burden in DAgger using support vectors for efficient learning from demonstrations in high dimensional state spaces , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Jonathan Tompson,et al.  Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks , 2020, ArXiv.

[26]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[27]  Honglak Lee,et al.  Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies , 2018, NeurIPS.

[28]  Oleg O. Sushkov,et al.  Robust Multi-Modal Policies for Industrial Assembly via Reinforcement Learning and Demonstrations: A Large-Scale Study , 2021, Robotics: Science and Systems.

[29]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[30]  Peter R. Florence,et al.  Transporter Networks: Rearranging the Visual World for Robotic Manipulation , 2020, CoRL.

[31]  Robert Platt,et al.  Deictic Image Maps: An Abstraction For Learning Pose Invariant Manipulation Policies , 2018, AAAI.

[32]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[33]  Khashayar Rohanimanesh,et al.  Self-Supervised Goal-Conditioned Pick and Place , 2020, ArXiv.

[34]  Andy Zeng,et al.  Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Russ Tedrake,et al.  Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation , 2018, CoRL.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Alberto Rodriguez,et al.  TossingBot: Learning to Throw Arbitrary Objects With Residual Physics , 2019, IEEE Transactions on Robotics.

[38]  Qusay H. Mahmoud,et al.  A Survey of Multi-Task Deep Reinforcement Learning , 2020, Electronics.

[39]  Andy Zeng,et al.  Learning Visual Affordances for Robotic Manipulation , 2019 .

[40]  Stuart J. Russell,et al.  The MAGICAL Benchmark for Robust Imitation , 2020, NeurIPS.

[41]  Sergey Levine,et al.  MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale , 2021, ArXiv.

[42]  Kyunghyun Cho,et al.  Query-Efficient Imitation Learning for End-to-End Autonomous Driving , 2016, ArXiv.

[43]  Thomas Funkhouser,et al.  Grasping in the Wild: Learning 6DoF Closed-Loop Grasping From Low-Cost Demonstrations , 2020, IEEE Robotics and Automation Letters.

[44]  Honglak Lee,et al.  Adversarial Environment Generation for Learning to Navigate the Web , 2021, ArXiv.