Imitating Task and Motion Planning with Visuomotor Transformers

Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations. In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation. To that end, we present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent. OPTIMUS introduces a pipeline for generating TAMP data that is specifically curated for imitation learning and can be used to train performant transformer-based policies. In this paper, we present a thorough study of the design decisions required to imitate TAMP and demonstrate that OPTIMUS can solve a wide variety of challenging vision-based manipulation tasks with over 70 different objects, ranging from long-horizon pick-and-place tasks, to shelf and articulated object manipulation, achieving 70 to 80% success rates. Video results at https://mihdalal.github.io/optimus/

[1]  Eric A. Cousineau,et al.  Diffusion Policy: Visuomotor Policy Learning via Action Diffusion , 2023, Robotics: Science and Systems.

[2]  S. Levine,et al.  RT-1: Robotics Transformer for Real-World Control at Scale , 2022, Robotics: Science and Systems.

[3]  Joseph J. Lim,et al.  PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection , 2022, Robotics: Science and Systems.

[4]  Yashraj S. Narang,et al.  DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Bryan N. Peele,et al.  Motion Policy Networks , 2022, CoRL.

[6]  Lerrel Pinto,et al.  From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data , 2022, ICLR.

[7]  Ludwig Schmidt,et al.  LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.

[8]  Li Fei-Fei,et al.  VIMA: General Robot Manipulation with Multimodal Prompts , 2022, ArXiv.

[9]  D. Fox,et al.  Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation , 2022, CoRL.

[10]  Ricardo Garcia Pinel,et al.  Instruction-driven history-aware policies for robotic manipulations , 2022, CoRL.

[11]  Lerrel Pinto,et al.  Behavior Transformers: Cloning k modes with one stone , 2022, NeurIPS.

[12]  Daniel Y. Fu,et al.  FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness , 2022, NeurIPS.

[13]  David J. Fleet,et al.  Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[14]  Sergio Gomez Colmenarejo,et al.  A Generalist Agent , 2022, Trans. Mach. Learn. Res..

[15]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[16]  S. Levine,et al.  Do As I Can, Not As I Say: Grounding Language in Robotic Affordances , 2022, CoRL.

[17]  Chelsea Finn,et al.  Vision-Based Manipulators Need to Also See from Their Hands , 2022, ICLR.

[18]  Renelito Delos Santos,et al.  LaMDA: Language Models for Dialog Applications , 2022, ArXiv.

[19]  Sergey Levine,et al.  Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets , 2021, Robotics: Science and Systems XVIII.

[20]  Leslie Pack Kaelbling,et al.  Long-Horizon Manipulation of Unknown Objects via Task and Motion Planning with Estimated Affordances , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[21]  Sergey Levine,et al.  BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning , 2022, CoRL.

[22]  Silvio Savarese,et al.  Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation , 2021, CoRL.

[23]  Michael James McDonald,et al.  Guided Imitation of Task and Motion Planning , 2021, CoRL.

[24]  Marc Toussaint,et al.  Learning Models as Functionals of Signed-Distance Fields for Manipulation Planning , 2021, CoRL.

[25]  Ashwin Balakrishna,et al.  ThriftyDAgger: Budget-Aware Novelty and Risk Gating for Interactive Imitation Learning , 2021, CoRL.

[26]  Miles Macklin,et al.  Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning , 2021, NeurIPS Datasets and Benchmarks.

[27]  Silvio Savarese,et al.  What Matters in Learning from Offline Human Demonstrations for Robot Manipulation , 2021, CoRL.

[28]  Dieter Fox,et al.  NeRP: Neural Rearrangement Planning for Unknown Objects , 2021, Robotics: Science and Systems.

[29]  S. Levine,et al.  MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale , 2021, ArXiv.

[30]  Brijen Thananjeyan,et al.  LazyDAgger: Reducing Context Switching in Interactive Imitation Learning , 2021, 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE).

[31]  Silvio Savarese,et al.  Generalization Through Hand-Eye Coordination: An Action Space for Learning Spatially-Invariant Visuomotor Control , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32]  Silvio Savarese,et al.  Learning Multi-Arm Manipulation Through Collaborative Teleoperation , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Dieter Fox,et al.  ACRONYM: A Large-Scale Grasp Dataset Based on Simulation , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[34]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[35]  Caelan Reed Garrett,et al.  Learning compositional models of robot skills for task and motion planning , 2020, Int. J. Robotics Res..

[36]  Corey Lynch,et al.  Language Conditioned Imitation Learning Over Unstructured Data , 2020, Robotics: Science and Systems.

[37]  R. Fergus,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ICLR.

[38]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Silvio Savarese,et al.  Human-in-the-Loop Imitation Learning using Remote Teleoperation , 2020, ArXiv.

[40]  Yuke Zhu,et al.  robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , 2020, ArXiv.

[41]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[42]  P. Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[43]  Sergey Levine,et al.  The Ingredients of Real-World Robotic Reinforcement Learning , 2020, ICLR.

[44]  Li Fei-Fei,et al.  Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations , 2020, Robotics: Science and Systems.

[45]  Caelan Reed Garrett,et al.  Online Replanning in Belief Space for Partially Observable Task and Motion Problems , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[46]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[47]  Caelan Reed Garrett,et al.  PDDLStream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning , 2018, ICAPS.

[48]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[49]  Silvio Savarese,et al.  Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[50]  S. Levine,et al.  Learning Latent Plans from Play , 2019, CoRL.

[51]  Katherine Rose Driggs-Campbell,et al.  HG-DAgger: Interactive Imitation Learning with Human Experts , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[52]  Michael C. Yip,et al.  Motion Planning Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[53]  Silvio Savarese,et al.  ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation , 2018, CoRL.

[54]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[55]  Marc Toussaint,et al.  Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning , 2018, Robotics: Science and Systems.

[56]  Ken Goldberg,et al.  Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[57]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[58]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Sebastian Scherer,et al.  Learning Heuristic Search via Imitation , 2017, CoRL.

[60]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[61]  Kyunghyun Cho,et al.  Query-Efficient Imitation Learning for End-to-End Autonomous Driving , 2016, ArXiv.

[62]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[64]  Nicholas Roy,et al.  Asymptotically Optimal Planning under Piecewise-Analytic Constraints , 2016, WAFR.

[65]  Patrick Beeson,et al.  TRAC-IK: An open-source library for improved solving of generic inverse kinematics , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[66]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[67]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[68]  Tae-Yong Kim,et al.  Unified particle physics for real-time applications , 2014, ACM Trans. Graph..

[69]  Scott Niekum,et al.  Incremental Semantically Grounded Learning from Demonstration , 2013, Robotics: Science and Systems.

[70]  Maya Cakmak,et al.  Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[71]  Darwin G. Caldwell,et al.  Imitation Learning of Positional and Force Skills Demonstrated via Kinesthetic Teaching and Haptic Input , 2011, Adv. Robotics.

[72]  Eric L. Sauser,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[73]  Victor Ng-Thow-Hing,et al.  Fast smoothing of manipulator trajectories using optimal bounded-acceleration shortcuts , 2010, 2010 IEEE International Conference on Robotics and Automation.

[74]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[75]  Henk Nijmeijer,et al.  Robot Programming by Demonstration , 2010, SIMPAR.

[76]  Aude Billard,et al.  Dynamical System Modulation for Robot Learning via Kinesthetic Demonstrations , 2008, IEEE Transactions on Robotics.

[77]  Mark W. Spong,et al.  Bilateral teleoperation: An historical survey , 2006, Autom..

[78]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[79]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[80]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[81]  Oussama Khatib,et al.  A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..