ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes

Progress in 3D object understanding has relied on manually “canonicalized” shape datasets that contain instances with consistent position and orientation (3D pose). This has made it hard to generalize these methods to in-the-wild shapes, e.g., from internet model collections or depth sensors. ConDor is a self-supervised method that learns to Canonicalize the 3D orientation and position for full and partial 3D point clouds. We build on top of Tensor Field Networks (TFNs), a class of permutationand rotationequivariant, and translation-invariant 3D networks. During inference, our method takes an unseen full or partial 3D point cloud at an arbitrary pose and outputs an equivariant canonical pose. During training, this network uses self-supervision losses to learn the canonical pose from an un-canonicalized collection of full and partial 3D point clouds. ConDor can also learn to consistently co-segment object parts without any supervision. Extensive quantitative results on four new metrics show that our approach outperforms existing methods while enabling new applications such as operation on depth images and annotation transfer.

[1]  Mathieu Aubry,et al.  A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  A. W. Knapp Representation Theory of Semisimple Groups: An Overview Based on Examples. . , 1986 .

[4]  Geoffrey E. Hinton,et al.  Canonical Capsules: Unsupervised Capsules in Canonical Pose , 2020, ArXiv.

[5]  Siddhartha Chaudhuri,et al.  BAE-NET: Branched Autoencoder for Shape Co-Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Patrick Labatut,et al.  Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Leonidas J. Guibas,et al.  Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images , 2020, ECCV.

[9]  K. Madhava Krishna,et al.  DRACO: Weakly Supervised Dense Reconstruction And Canonicalization of Objects , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[11]  Silvio Savarese,et al.  6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Mathieu Aubry,et al.  Learning elementary structures for 3D shape generation and matching , 2019, NeurIPS.

[13]  Li Li,et al.  Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds , 2018, ArXiv.

[14]  Leonidas J. Guibas,et al.  PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Leonidas J. Guibas,et al.  Multiview Aggregation for Learning Category-Specific Shape Reconstruction , 2019, NeurIPS.

[17]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[18]  G. Chirikjian,et al.  Engineering Applications of Noncommutative Harmonic Analysis: With Emphasis on Rotation and Motion Groups , 2000 .

[19]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[20]  Federico Tombari,et al.  SHOT: Unique signatures of histograms for surface and texture description , 2014, Comput. Vis. Image Underst..

[21]  Yue Wang,et al.  Deep Closest Point: Learning Representations for Point Cloud Registration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Jessika Weiss,et al.  Vision Science Photons To Phenomenology , 2016 .

[23]  Zihao Wang,et al.  Weakly-supervised 3D Shape Completion in the Wild , 2020, ECCV.

[24]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Andrew E. Johnson,et al.  Spin-Images: A Representation for 3-D Surface Matching , 1997 .

[26]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[27]  David W. Rosen,et al.  Rotation Invariant Convolutions for 3D Point Clouds Deep Learning , 2019, 2019 International Conference on 3D Vision (3DV).

[28]  Srinath Sridhar,et al.  Continuous Geodesic Convolutions for Learning on 3D Shapes , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[29]  Max Welling,et al.  3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data , 2018, NeurIPS.

[30]  Jiajun Wu,et al.  Learning to Reconstruct Shapes from Unseen Classes , 2018, NeurIPS.

[31]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[32]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jan Kautz,et al.  DeepGMR: Learning Latent Gaussian Mixture Models for Registration , 2020, ECCV.

[34]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[35]  Lale Akarun,et al.  Real time hand pose estimation using depth sensors , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[36]  Ingmar Posner,et al.  Voting for Voting in Online Point Cloud Object Detection , 2015, Robotics: Science and Systems.

[37]  R. Shepard,et al.  Mental Rotation of Three-Dimensional Objects , 1971, Science.

[38]  Andrea Vedaldi,et al.  C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Ming-Yu Liu,et al.  PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Chao Chen,et al.  ClusterNet: Deep Hierarchical Cluster Network With Rigorously Rotation-Invariant Representation for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Luciano Silva,et al.  Learning to Orient Surfaces by Self-supervised Spherical CNNs , 2020, NeurIPS.

[42]  Risi Kondor,et al.  Cormorant: Covariant Molecular Neural Networks , 2019, NeurIPS.

[43]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[44]  Maurice Weiler,et al.  A Wigner-Eckart Theorem for Group Equivariant Convolution Kernels , 2020, ICLR.

[45]  S. Edelman,et al.  Canonical views in object representation and recognition , 1994, Vision Research.

[46]  L. Guibas,et al.  A functional approach to rotation equivariant non-linearities for Tensor Field Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[48]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  M. Tarr,et al.  Mental rotation and orientation-dependence in shape recognition , 1989, Cognitive Psychology.

[50]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Kostas Daniilidis,et al.  Learning SO(3) Equivariant Representations with Spherical CNNs , 2017, International Journal of Computer Vision.

[52]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54]  Thomas Brox,et al.  What Do Single-View 3D Reconstruction Networks Learn? , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).