论文信息 - Canonical Capsules: Unsupervised Capsules in Canonical Pose

Canonical Capsules: Unsupervised Capsules in Canonical Pose

We propose a self-supervised capsule architecture for 3D point clouds. We compute capsule decompositions of objects through permutation-equivariant attention, and self-supervise the process by training with pairs of randomly rotated objects. Our key idea is to aggregate the attention masks into semantic keypoints, and use these to supervise a decomposition that satisfies the capsule invariance/equivariance properties. This not only enables the training of a semantically consistent decomposition, but also allows us to learn a canonicalization operation that enables object-centric reasoning. To train our neural network we require neither classification labels nor manually-aligned training datasets. Yet, by learning an object-centric representation in a self-supervised manner, our method outperforms the state-of-the-art on 3D point cloud reconstruction, canonicalization, and unsupervised classification.

[1] Andrea Tagliasacchi,et al. Vector Neurons: A General Framework for SO(3)-Equivariant Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[2] David J. Fleet,et al. Unsupervised part representation by Flow Capsules , 2020, ICML.

[3] Luciano Silva,et al. Learning to Orient Surfaces by Self-supervised Spherical CNNs , 2020, NeurIPS.

[4] Pascal Fua,et al. Better Patch Stitching for Parametric Surface Reconstruction , 2020, 2020 International Conference on 3D Vision (3DV).

[5] Jan Kautz,et al. DeepGMR: Learning Latent Gaussian Mixture Models for Registration , 2020, ECCV.

[6] Zihao Wang,et al. Weakly-supervised 3D Shape Completion in the Wild , 2020, ECCV.

[7] L. Guibas,et al. CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations , 2020, NeurIPS.

[8] Leonidas J. Guibas,et al. PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding , 2020, ECCV.

[9] Eduard Trulls,et al. ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] 俊一甘利. 5分で分かる!? 有名論文ナナメ読み：Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[11] Zi Jian Yew,et al. RPM-Net: Robust Point Matching Using Learned Features , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Richard A. Newcombe,et al. Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction , 2020, ECCV.

[13] J. J. Guerrero,et al. Unsupervised Learning of Category-Specific Symmetric 3D Keypoints from Point Sets , 2020, ECCV.

[14] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[15] J. D. Wegner,et al. Learning Multiview 3D Point Cloud Registration , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Emanuele Menegatti,et al. Quaternion Equivariant Capsule Networks for 3D Point Clouds , 2019, ECCV.

[17] Thomas Funkhouser,et al. Deep Structured Implicit Functions , 2019, ArXiv.

[18] Geoffrey E. Hinton,et al. NASA: Neural Articulated Shape Approximation , 2019, ECCV.

[19] Nitish Srivastava,et al. Geometric Capsule Autoencoders for 3D Point Clouds , 2019, ArXiv.

[20] Hao Zhang,et al. BSP-Net: Generating Compact Meshes via Binary Space Partitioning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Geoffrey E. Hinton,et al. CvxNet: Learnable Convex Decomposition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Andrea Vedaldi,et al. C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Mathieu Aubry,et al. Learning elementary structures for 3D shape generation and matching , 2019, NeurIPS.

[24] Yee Whye Teh,et al. Stacked Capsule Autoencoders , 2019, NeurIPS.

[25] Chao Chen,et al. ClusterNet: Deep Hierarchical Cluster Network With Rigorously Rotation-Invariant Representation for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Yue Wang,et al. Deep Closest Point: Learning Representations for Point Cloud Registration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27] Thomas A. Funkhouser,et al. Learning Shape Templates With Structured Implicit Functions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28] Richard A. Newcombe,et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Leonidas J. Guibas,et al. Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Federico Tombari,et al. 3D Point Capsule Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Hao Zhang,et al. Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Yang Liu,et al. Adaptive O-CNN , 2018, ACM Trans. Graph..

[34] Jonathan Tompson,et al. Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning , 2018, NeurIPS.

[35] Wei Liu,et al. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[36] Kaiming He,et al. Group Normalization , 2018, International Journal of Computer Vision.

[37] Li Li,et al. Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds , 2018, ArXiv.

[38] Mathieu Aubry,et al. A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39] Yue Wang,et al. Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[40] Dong Tian,et al. FoldingNet: Point Cloud Auto-Encoder via Deep Grid Deformation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41] Alexander M. Bronstein,et al. Deformable Shape Completion with Graph Convolutional Autoencoders , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42] Jitendra Malik,et al. Learning a Multi-View Stereo Machine , 2017, NIPS.

[43] Yang Liu,et al. O-CNN , 2017, ACM Trans. Graph..

[44] Leonidas J. Guibas,et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[45] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Leonidas J. Guibas,et al. Learning Shape Abstractions by Assembling Volumetric Primitives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Ersin Yumer,et al. Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[49] Thomas A. Funkhouser,et al. Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Vladlen Koltun,et al. Fast Global Registration , 2016, ECCV.

[51] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[53] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[55] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[56] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[57] Daniel Cohen-Or,et al. Co-hierarchical analysis of shape structures , 2013, ACM Trans. Graph..

[58] Geoffrey E. Hinton,et al. Transforming Auto-Encoders , 2011, ICANN.

[59] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[60] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[61] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[62] Ken Shoemake,et al. Uniform Random Rotations , 1992, Graphics Gems III.

[63] Geoffrey E. Hinton,et al. Shape Recognition and Illusory Conjunctions , 1985, IJCAI.

[64] Geoffrey E. Hinton. A Parallel Computation that Assigns Canonical Object-Based Frames of Reference , 1981, IJCAI.

[65] R. Hartley,et al. Rotation Averaging , 2012, International Journal of Computer Vision.