ReorientDiff: Diffusion Model based Reorientation for Object Manipulation

The ability to manipulate objects in a desired configurations is a fundamental requirement for robots to complete various practical applications. While certain goals can be achieved by picking and placing the objects of interest directly, object reorientation is needed for precise placement in most of the tasks. In such scenarios, the object must be reoriented and re-positioned into intermediate poses that facilitate accurate placement at the target pose. To this end, we propose a reorientation planning method, ReorientDiff, that utilizes a diffusion model-based approach. The proposed method employs both visual inputs from the scene, and goal-specific language prompts to plan intermediate reorientation poses. Specifically, the scene and language-task information are mapped into a joint scene-task representation feature space, which is subsequently leveraged to condition the diffusion model. The diffusion model samples intermediate poses based on the representation using classifier-free guidance and then uses gradients of learned feasibility-score models for implicit iterative pose-refinement. The proposed method is evaluated using a set of YCB-objects and a suction gripper, demonstrating a success rate of 96.5\% in simulation. Overall, our study presents a promising approach to address the reorientation challenge in manipulation by learning a conditional distribution, which is an effective way to move towards more generalizable object manipulation. For more results, checkout our website: https://utkarshmishra04.github.io/ReorientDiff.

[1]  Karol Hausman,et al.  Scaling Robot Learning with Semantically Imagined Experience , 2023, Robotics: Science and Systems.

[2]  P. Abbeel,et al.  Learning Universal Policies via Text-Guided Video Generation , 2023, NeurIPS.

[3]  J. Tenenbaum,et al.  Is Conditional Generative Modeling all you need for Decision-Making? , 2022, ArXiv.

[4]  S. Chernova,et al.  StructDiffusion: Object-Centric Diffusion for Semantic Rearrangement of Novel Objects , 2022, ArXiv.

[5]  Zhiyuan Chen,et al.  Planar Manipulation via Learning Regrasping , 2022, arXiv.org.

[6]  D. Fox,et al.  Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation , 2022, CoRL.

[7]  Jonathan Ho Classifier-Free Diffusion Guidance , 2022, ArXiv.

[8]  S. Levine,et al.  Planning with Diffusion for Flexible Behavior Synthesis , 2022, ICML.

[9]  Yongxin Chen,et al.  Fast Sampling of Diffusion Models with Exponential Integrator , 2022, ICLR.

[10]  S. Levine,et al.  Do As I Can, Not As I Say: Grounding Language in Robotic Affordances , 2022, CoRL.

[11]  Stephen James,et al.  ReorientBot: Learning Object Reorientation for Specific-Posed Placement , 2022, 2022 International Conference on Robotics and Automation (ICRA).

[12]  R. Mottaghi,et al.  Simple but Effective: CLIP Embeddings for Embodied AI , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Dieter Fox,et al.  StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[14]  G. Sukhatme,et al.  Selective Object Rearrangement in Clutter , 2022, CoRL.

[15]  Jan Peters,et al.  SE(3)-DiffusionFields: Learning cost functions for joint grasp and motion optimization through diffusion , 2022 .

[16]  Dieter Fox,et al.  CLIPort: What and Where Pathways for Robotic Manipulation , 2021, CoRL.

[17]  Lin Shao,et al.  Learning to Regrasp by Learning to Place , 2021, CoRL.

[18]  Marco Hutter,et al.  Grasping and Object Reorientation for Autonomous Construction of Stone Structures , 2021, IEEE Robotics and Automation Letters.

[19]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[20]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[21]  Peter R. Florence,et al.  Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[23]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[24]  Peter R. Florence,et al.  Transporter Networks: Rearranging the Visual World for Robotic Manipulation , 2020, CoRL.

[25]  Alexander Toshev,et al.  ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects , 2020, ArXiv.

[26]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[27]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[28]  D. Fox,et al.  Self-supervised 6D Object Pose Estimation for Robot Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Dieter Fox,et al.  6-DOF GraspNet: Variational Grasp Generation for Object Manipulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Ken Goldberg,et al.  Learning ambidextrous robot grasping policies , 2019, Science Robotics.

[31]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[32]  Ian Taylor,et al.  Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[34]  Kuan-Ting Yu,et al.  Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Siddhartha S. Srinivasa,et al.  The YCB object and Model set: Towards common benchmarks for manipulation research , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[37]  Kostas Daniilidis,et al.  Single image 3D object detection and pose estimation for grasping , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Lydia E. Kavraki,et al.  The Open Motion Planning Library , 2012, IEEE Robotics & Automation Magazine.

[39]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).