Canonical Capsules: Unsupervised Capsules in Canonical Pose

We propose a self-supervised capsule architecture for 3D point clouds. We compute capsule decompositions of objects through permutation-equivariant attention, and self-supervise the process by training with pairs of randomly rotated objects. Our key idea is to aggregate the attention masks into semantic keypoints, and use these to supervise a decomposition that satisfies the capsule invariance/equivariance properties. This not only enables the training of a semantically consistent decomposition, but also allows us to learn a canonicalization operation that enables object-centric reasoning. To train our neural network we require neither classification labels nor manually-aligned training datasets. Yet, by learning an object-centric representation in a self-supervised manner, our method outperforms the state-of-the-art on 3D point cloud reconstruction, canonicalization, and unsupervised classification.

[1]  Andrea Tagliasacchi,et al.  Vector Neurons: A General Framework for SO(3)-Equivariant Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  David J. Fleet,et al.  Unsupervised part representation by Flow Capsules , 2020, ICML.

[3]  Luciano Silva,et al.  Learning to Orient Surfaces by Self-supervised Spherical CNNs , 2020, NeurIPS.

[4]  Pascal Fua,et al.  Better Patch Stitching for Parametric Surface Reconstruction , 2020, 2020 International Conference on 3D Vision (3DV).

[5]  Jan Kautz,et al.  DeepGMR: Learning Latent Gaussian Mixture Models for Registration , 2020, ECCV.

[6]  Zihao Wang,et al.  Weakly-supervised 3D Shape Completion in the Wild , 2020, ECCV.

[7]  L. Guibas,et al.  CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations , 2020, NeurIPS.

[8]  Leonidas J. Guibas,et al.  PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding , 2020, ECCV.

[9]  Eduard Trulls,et al.  ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[11]  Zi Jian Yew,et al.  RPM-Net: Robust Point Matching Using Learned Features , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Richard A. Newcombe,et al.  Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction , 2020, ECCV.

[13]  J. J. Guerrero,et al.  Unsupervised Learning of Category-Specific Symmetric 3D Keypoints from Point Sets , 2020, ECCV.

[14]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[15]  J. D. Wegner,et al.  Learning Multiview 3D Point Cloud Registration , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Emanuele Menegatti,et al.  Quaternion Equivariant Capsule Networks for 3D Point Clouds , 2019, ECCV.

[17]  Thomas Funkhouser,et al.  Deep Structured Implicit Functions , 2019, ArXiv.

[18]  Geoffrey E. Hinton,et al.  NASA: Neural Articulated Shape Approximation , 2019, ECCV.

[19]  Nitish Srivastava,et al.  Geometric Capsule Autoencoders for 3D Point Clouds , 2019, ArXiv.

[20]  Hao Zhang,et al.  BSP-Net: Generating Compact Meshes via Binary Space Partitioning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Geoffrey E. Hinton,et al.  CvxNet: Learnable Convex Decomposition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Andrea Vedaldi,et al.  C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Mathieu Aubry,et al.  Learning elementary structures for 3D shape generation and matching , 2019, NeurIPS.

[24]  Yee Whye Teh,et al.  Stacked Capsule Autoencoders , 2019, NeurIPS.

[25]  Chao Chen,et al.  ClusterNet: Deep Hierarchical Cluster Network With Rigorously Rotation-Invariant Representation for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yue Wang,et al.  Deep Closest Point: Learning Representations for Point Cloud Registration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Thomas A. Funkhouser,et al.  Learning Shape Templates With Structured Implicit Functions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Federico Tombari,et al.  3D Point Capsule Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yang Liu,et al.  Adaptive O-CNN , 2018, ACM Trans. Graph..

[34]  Jonathan Tompson,et al.  Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning , 2018, NeurIPS.

[35]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[36]  Kaiming He,et al.  Group Normalization , 2018, International Journal of Computer Vision.

[37]  Li Li,et al.  Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds , 2018, ArXiv.

[38]  Mathieu Aubry,et al.  A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[40]  Dong Tian,et al.  FoldingNet: Point Cloud Auto-Encoder via Deep Grid Deformation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Alexander M. Bronstein,et al.  Deformable Shape Completion with Graph Convolutional Autoencoders , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Jitendra Malik,et al.  Learning a Multi-View Stereo Machine , 2017, NIPS.

[43]  Yang Liu,et al.  O-CNN , 2017, ACM Trans. Graph..

[44]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[45]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Leonidas J. Guibas,et al.  Learning Shape Abstractions by Assembling Volumetric Primitives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Ersin Yumer,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[49]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Vladlen Koltun,et al.  Fast Global Registration , 2016, ECCV.

[51]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[53]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[55]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[56]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[57]  Daniel Cohen-Or,et al.  Co-hierarchical analysis of shape structures , 2013, ACM Trans. Graph..

[58]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[59]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[61]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[62]  Ken Shoemake,et al.  Uniform Random Rotations , 1992, Graphics Gems III.

[63]  Geoffrey E. Hinton,et al.  Shape Recognition and Illusory Conjunctions , 1985, IJCAI.

[64]  Geoffrey E. Hinton A Parallel Computation that Assigns Canonical Object-Based Frames of Reference , 1981, IJCAI.

[65]  R. Hartley,et al.  Rotation Averaging , 2012, International Journal of Computer Vision.