论文信息 - Imitating Task and Motion Planning with Visuomotor Transformers

Imitating Task and Motion Planning with Visuomotor Transformers

Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations. In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation. To that end, we present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent. OPTIMUS introduces a pipeline for generating TAMP data that is specifically curated for imitation learning and can be used to train performant transformer-based policies. In this paper, we present a thorough study of the design decisions required to imitate TAMP and demonstrate that OPTIMUS can solve a wide variety of challenging vision-based manipulation tasks with over 70 different objects, ranging from long-horizon pick-and-place tasks, to shelf and articulated object manipulation, achieving 70 to 80% success rates. Video results at https://mihdalal.github.io/optimus/

[1] Eric A. Cousineau,et al. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion , 2023, Robotics: Science and Systems.

[2] S. Levine,et al. RT-1: Robotics Transformer for Real-World Control at Scale , 2022, Robotics: Science and Systems.

[3] Joseph J. Lim,et al. PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection , 2022, Robotics: Science and Systems.

[4] Yashraj S. Narang,et al. DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[5] Bryan N. Peele,et al. Motion Policy Networks , 2022, CoRL.

[6] Lerrel Pinto,et al. From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data , 2022, ICLR.

[7] Ludwig Schmidt,et al. LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.

[8] Li Fei-Fei,et al. VIMA: General Robot Manipulation with Multimodal Prompts , 2022, ArXiv.

[9] D. Fox,et al. Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation , 2022, CoRL.

[10] Ricardo Garcia Pinel,et al. Instruction-driven history-aware policies for robotic manipulations , 2022, CoRL.

[11] Lerrel Pinto,et al. Behavior Transformers: Cloning k modes with one stone , 2022, NeurIPS.

[12] Daniel Y. Fu,et al. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness , 2022, NeurIPS.

[13] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[14] Sergio Gomez Colmenarejo,et al. A Generalist Agent , 2022, Trans. Mach. Learn. Res..

[15] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[16] S. Levine,et al. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances , 2022, CoRL.

[17] Chelsea Finn,et al. Vision-Based Manipulators Need to Also See from Their Hands , 2022, ICLR.

[18] Renelito Delos Santos,et al. LaMDA: Language Models for Dialog Applications , 2022, ArXiv.

[19] Sergey Levine,et al. Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets , 2021, Robotics: Science and Systems XVIII.

[20] Leslie Pack Kaelbling,et al. Long-Horizon Manipulation of Unknown Objects via Task and Motion Planning with Estimated Affordances , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[21] Sergey Levine,et al. BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning , 2022, CoRL.

[22] Silvio Savarese,et al. Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation , 2021, CoRL.

[23] Michael James McDonald,et al. Guided Imitation of Task and Motion Planning , 2021, CoRL.

[24] Marc Toussaint,et al. Learning Models as Functionals of Signed-Distance Fields for Manipulation Planning , 2021, CoRL.

[25] Ashwin Balakrishna,et al. ThriftyDAgger: Budget-Aware Novelty and Risk Gating for Interactive Imitation Learning , 2021, CoRL.

[26] Miles Macklin,et al. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning , 2021, NeurIPS Datasets and Benchmarks.

[27] Silvio Savarese,et al. What Matters in Learning from Offline Human Demonstrations for Robot Manipulation , 2021, CoRL.

[28] Dieter Fox,et al. NeRP: Neural Rearrangement Planning for Unknown Objects , 2021, Robotics: Science and Systems.

[29] S. Levine,et al. MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale , 2021, ArXiv.

[30] Brijen Thananjeyan,et al. LazyDAgger: Reducing Context Switching in Interactive Imitation Learning , 2021, 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE).

[31] Silvio Savarese,et al. Generalization Through Hand-Eye Coordination: An Action Space for Learning Spatially-Invariant Visuomotor Control , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32] Silvio Savarese,et al. Learning Multi-Arm Manipulation Through Collaborative Teleoperation , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[33] Dieter Fox,et al. ACRONYM: A Large-Scale Grasp Dataset Based on Simulation , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[34] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[35] Caelan Reed Garrett,et al. Learning compositional models of robot skills for task and motion planning , 2020, Int. J. Robotics Res..

[36] Corey Lynch,et al. Language Conditioned Imitation Learning Over Unstructured Data , 2020, Robotics: Science and Systems.

[37] R. Fergus,et al. Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ICLR.

[38] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[39] Silvio Savarese,et al. Human-in-the-Loop Imitation Learning using Remote Teleoperation , 2020, ArXiv.

[40] Yuke Zhu,et al. robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , 2020, ArXiv.

[41] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[42] P. Abbeel,et al. Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[43] Sergey Levine,et al. The Ingredients of Real-World Robotic Reinforcement Learning , 2020, ICLR.

[44] Li Fei-Fei,et al. Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations , 2020, Robotics: Science and Systems.

[45] Caelan Reed Garrett,et al. Online Replanning in Belief Space for Partially Observable Task and Motion Problems , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[46] Sergey Levine,et al. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[47] Caelan Reed Garrett,et al. PDDLStream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning , 2018, ICAPS.

[48] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[49] Silvio Savarese,et al. Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[50] S. Levine,et al. Learning Latent Plans from Play , 2019, CoRL.

[51] Katherine Rose Driggs-Campbell,et al. HG-DAgger: Interactive Imitation Learning with Human Experts , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[52] Michael C. Yip,et al. Motion Planning Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[53] Silvio Savarese,et al. ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation , 2018, CoRL.

[54] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[55] Marc Toussaint,et al. Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning , 2018, Robotics: Science and Systems.

[56] Ken Goldberg,et al. Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[57] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[58] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59] Sebastian Scherer,et al. Learning Heuristic Search via Imitation , 2017, CoRL.

[60] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[61] Kyunghyun Cho,et al. Query-Efficient Imitation Learning for End-to-End Autonomous Driving , 2016, ArXiv.

[62] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[64] Nicholas Roy,et al. Asymptotically Optimal Planning under Piecewise-Analytic Constraints , 2016, WAFR.

[65] Patrick Beeson,et al. TRAC-IK: An open-source library for improved solving of generic inverse kinematics , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[66] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[67] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[68] Tae-Yong Kim,et al. Unified particle physics for real-time applications , 2014, ACM Trans. Graph..

[69] Scott Niekum,et al. Incremental Semantically Grounded Learning from Demonstration , 2013, Robotics: Science and Systems.

[70] Maya Cakmak,et al. Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[71] Darwin G. Caldwell,et al. Imitation Learning of Positional and Force Skills Demonstrated via Kinesthetic Teaching and Haptic Input , 2011, Adv. Robotics.

[72] Eric L. Sauser,et al. Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[73] Victor Ng-Thow-Hing,et al. Fast smoothing of manipulator trajectories using optimal bounded-acceleration shortcuts , 2010, 2010 IEEE International Conference on Robotics and Automation.

[74] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[75] Henk Nijmeijer,et al. Robot Programming by Demonstration , 2010, SIMPAR.

[76] Aude Billard,et al. Dynamical System Modulation for Robot Learning via Kinesthetic Demonstrations , 2008, IEEE Transactions on Robotics.

[77] Mark W. Spong,et al. Bilateral teleoperation: An historical survey , 2006, Autom..

[78] Jun Nakanishi,et al. Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[79] Steven M. LaValle,et al. RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[80] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[81] Oussama Khatib,et al. A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..