Learning to Assemble: Estimating 6D Poses for Robotic Object-Object Manipulation

In this letter we propose a robotic vision task with the goal of enabling robots to execute complex assembly tasks in unstructured environments using a camera as the primary sensing device. We formulate the task as an instance of 6D pose estimation of template geometries, to which manipulation objects should be connected. In contrast to the standard 6D pose estimation task, this requires reasoning about local geometry that is surrounded by arbitrary context, such as a power outlet embedded into a wall. We propose a deep learning based approach to solve this task alongside a novel dataset that will enable future work in this direction and can serve as a benchmark. We experimentally show that state-of-the-art 6D pose estimation methods alone are not sufficient to solve the task but that our training procedure significantly improves the performance of deep learning techniques in this context.

[1]  Kensuke Harada,et al.  Assembly Sequence Planning for Motion Planning , 2016, ArXiv.

[2]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[4]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[5]  Vincent Lepetit,et al.  Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Kostas E. Bekris,et al.  Towards Robust Product Packing with a Minimalistic End-Effector , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[7]  Pascal Fua,et al.  Segmentation-Driven 6D Object Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[9]  Varun Jampani,et al.  Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Andrew J. Davison,et al.  Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task , 2017, CoRL.

[11]  Sonia Chernova,et al.  Tool Macgyvering: Tool Construction Using Geometric Reasoning , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[12]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[13]  Vincent Lepetit,et al.  Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation , 2018, ECCV.

[14]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[16]  Roi Poranne,et al.  PuppetMaster: robotic animation of marionettes , 2019, ACM Trans. Graph..

[17]  Kostas E. Bekris,et al.  A self-supervised learning system for object detection using physics simulation and multi-view pose estimation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[19]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Daniel Leidner,et al.  Iteratively Refined Feasibility Checks in Robotic Assembly Sequence Planning , 2019, IEEE Robotics and Automation Letters.