ACR-Pose: Adversarial Canonical Representation Reconstruction Network for Category Level 6D Object Pose Estimation

Recently, category-level 6D object pose estimation has achieved significant improvements with the development of reconstructing canonical 3D representations. However, the reconstruction quality of existing methods is still far from excellent. In this paper, we propose a novel Adversarial Canonical Representation Reconstruction Network named ACR-Pose. ACR-Pose consists of a Reconstructor and a Discriminator. The Reconstructor is primarily composed of two novel sub-modules: Pose-Irrelevant Module (PIM) and Relational Reconstruction Module (RRM). PIM tends to learn canonical-related features to make the Reconstructor insensitive to rotation and translation, while RRM explores essential relational information between different input modalities to generate high-quality features. Subsequently, a Discriminator is employed to guide the Reconstructor to generate realistic canonical representations. The Reconstructor and the Discriminator learn to optimize through adversarial training. Experimental results on the prevalent NOCS-CAMERA and NOCS-REAL datasets demonstrate that our method achieves state-of-the-art performance.

[1]  Cewu Lu,et al.  PRIN/SPRIN: On Extracting Point-Wise Rotation Invariant Features , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ernesto Damiani,et al.  Augmented reality technologies, systems and applications , 2010, Multimedia Tools and Applications.

[3]  Fuxin Li,et al.  PointConv: Deep Convolutional Networks on 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Slobodan Ilic,et al.  DPOD: 6D Pose Object Detector and Refiner , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Ronald Azuma,et al.  A Survey of Augmented Reality , 1997, Presence: Teleoperators & Virtual Environments.

[6]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[7]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[8]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[9]  Ales Leonardis,et al.  FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Timothy Patten,et al.  Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Hongyan Liu,et al.  Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview , 2021, ArXiv.

[13]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[14]  Daniel Cohen-Or,et al.  PU-GAN: A Point Cloud Upsampling Adversarial Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Ali Etemad,et al.  Vote from the Center: 6 DoF Pose Estimation in RGB-D Images by Radial Keypoint Voting , 2021, European Conference on Computer Vision.

[16]  Pascal Fua,et al.  Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Wei Sun,et al.  PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Gim Hee Lee,et al.  Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation , 2020, ECCV.

[20]  Haoqiang Fan,et al.  FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[22]  J. Kumar,et al.  Recent Development of Augmented Reality in Surgery: A Review , 2017, Journal of healthcare engineering.

[23]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[24]  Hongyan Liu,et al.  Attentive Rotation Invariant Convolution for Point Cloud-based Large Scale Place Recognition , 2021, ArXiv.

[25]  Jonathan T. Barron,et al.  iNeRF: Inverting Neural Radiance Fields for Pose Estimation , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ruigang Yang,et al.  ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xiangyang Ji,et al.  CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Jiaru Song,et al.  HybridPose: 6D Object Pose Estimation Under Hybrid Representations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Tae-Kyun Kim,et al.  Introducing Pose Consistency and Warp-Alignment for Self-Supervised 6D Object Pose Estimation in Color Images , 2020, 2020 International Conference on 3D Vision (3DV).

[32]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[33]  Bo Dai,et al.  Unsupervised 3D Shape Completion through GAN Inversion , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Kai Xu,et al.  Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yi Li,et al.  DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[36]  Chi-Keung Tang,et al.  GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision , 2020, ECCV.

[37]  Nassir Navab,et al.  Deep Model-Based 6D Pose Refinement in RGB , 2018, ECCV.

[38]  Kui Jia,et al.  DualPoseNet: Category-level 6D Object Pose and Size Estimation Using Dual Pose Network with Refined Learning of Pose Consistency , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Byeong-Uk Lee,et al.  Category-Level Metric Scale Object Shape and Pose Estimation , 2021, IEEE Robotics and Automation Letters.

[40]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[41]  Yang Yu,et al.  Unsupervised Representation Learning with Deep Convolutional Neural Network for Remote Sensing Images , 2017, ICIG.

[42]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[43]  Vijay Kumar,et al.  Robotic grasping and contact: a review , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[44]  Wenqiang Zhang,et al.  EfficientPose: Efficient human pose estimation with neural architecture search , 2020, Computational Visual Media.

[45]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[46]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Michael J. Black,et al.  VIBE: Video Inference for Human Body Pose and Shape Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[51]  Hujun Bao,et al.  PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[53]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).