Category-Level Metric Scale Object Shape and Pose Estimation

Advances in deep learning recognition have led to accurate object detection with 2D images. However, these 2D perception methods are insufficient for complete 3D world information. Concurrently, advanced 3D shape estimation approaches focus on the shape itself, without considering metric scale. These methods cannot determine the accurate location and orientation of objects. To tackle this problem, we propose a framework that jointly estimates a metric scale shape and pose from a single RGB image. Our framework has two branches: the Metric Scale Object Shape branch (MSOS) and the Normalized Object Coordinate Space branch (NOCS). The MSOS branch estimates the metric scale shape observed in the camera coordinates. The NOCS branch predicts the normalized object coordinate space (NOCS) map and performs similarity transformation with the rendered depth map from a predicted metric scale mesh to obtain 6D pose and size. Additionally, we introduce the Normalized Object Center Estimation (NOCE) to estimate the geometrically aligned distance from the camera to the object center. We validated our method on both synthetic and real-world datasets to evaluate category-level object pose and shape.

[1]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[2]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[3]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  R. Venkatesh Babu,et al.  3D-PSRNet: Part Segmented 3D Point Cloud Reconstruction From a Single Image , 2018, ECCV Workshops.

[5]  Jie Song,et al.  Category Level Object Pose Estimation via Neural Analysis-by-Synthesis , 2020, ECCV.

[6]  Hujun Bao,et al.  PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[8]  Pascal Fua,et al.  Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Silvio Savarese,et al.  DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[10]  Peter I. Corke,et al.  Cartman: The Low-Cost Cartesian Manipulator that Won the Amazon Robotics Challenge , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Jing Xu,et al.  Point-Based Multi-View Stereo Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Xiangyang Ji,et al.  CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[14]  Jaakko Lehtinen,et al.  Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer , 2019, NeurIPS.

[15]  Timothy Patten,et al.  Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[17]  Torsten Sattler,et al.  Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Theodore Lim,et al.  Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[19]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Gim Hee Lee,et al.  Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation , 2020, ECCV.

[21]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[22]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jitendra Malik,et al.  Mesh R-CNN , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[26]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[27]  Mathieu Aubry,et al.  A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Slobodan Ilic,et al.  DPOD: 6D Pose Object Detector and Refiner , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Xiaoguang Han,et al.  Deep Mesh Reconstruction From Single RGB Images via Topology Modification Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[31]  Kai Xu,et al.  Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[33]  Federico Tombari,et al.  Self6D: Self-Supervised Monocular 6D Object Pose Estimation , 2020, ECCV.